How Roku Cut vCPU Waste Across 110+ Clusters with ScaleOps

Main takeaway

ScaleOps helped Roku achieve 32% vCPU reduction across 110+ Kubernetes clusters at just 20% automation coverage, with a projected path to 72%+. Zero production incidents. No workflow changes for engineering teams. The key: safety-first architecture that makes engineers trust automation.

Based on a webinar conversation with Dieter Matzion, Senior Cloud Governance Engineer at Roku.

The Problem: Why Kubernetes Resource Optimization Fails at Scale

Kubernetes resource optimization — the practice of continuously aligning CPU and memory allocation with actual workload demand — is one of the most talked-about and least-executed disciplines in cloud infrastructure. Somewhere between 60 and 70% of the resources you’re paying for in Kubernetes are sitting unused. Sysdig’s analysis of billions of containers found 69% of purchased CPU goes idle. Datadog reports median CPU utilization around 16%.

Everyone knows the waste exists. Almost nobody fixes it.

The reason is an asymmetry that shapes every resource decision: the cost of under-provisioning is immediate and visible: OOM kills, latency spikes, engineers paged at 3am. The cost of over-provisioning is just a bigger cloud bill. Nobody gets fired for a bigger bill. So engineers pad their requests generously, and at scale. Hundreds of services, traffic patterns shifting with every deploy, institutional knowledge leaving with every engineer. Manual rightsizing becomes structurally impossible.

The Kubernetes Vertical Pod Autoscaler (VPA) was designed to solve this, yet fewer than 1% of organizations run it in production. Five problems make it a non-starter at scale. First, it triggers uncontrolled pod restarts to apply changes in auto mode, which is disruptive in any production environment. Second, its global histogram decay hides spikes and recent demand shifts. Third, teams default to recommendation-only mode and never act on the data. Fourth, it doesn’t validate recommendations against cluster context, leading to OOM kill loops when recommendations exceed node capacity. Fifth, it conflicts directly with the Horizontal Pod Autoscaler (HPA): VPA changes requests, HPA’s percentage math breaks, and you get scaling thrashing. The tools exist. They just don’t work together at production scale.

→ Watch: why clusters stay oversized (3:15)

The Real Bottleneck in Kubernetes Resource Optimization: Engineering Trust

The real blocker for Kubernetes resource optimization at enterprise scale isn’t technical: it’s human psychology. The incentive structure is broken: if an engineer does nothing, the company pays the same cloud bill and everyone keeps their job. If they reduce resources and something breaks in production, that’s a career-impacting incident.

Dieter Matzion, Senior Cloud Governance Engineer at Roku’s Cloud Center of Excellence — with over a decade of cloud financial management experience across Google, Netflix, and Intuit — put it directly: the team didn’t want to be the responsible party causing outages or waking engineers up at night.

This is the insight ScaleOps was built around. Safety unlocks automation. If engineers trust that a system won’t break production, they’ll let it optimize. Guard rails aren’t a feature in ScaleOps — they’re the foundation of the product architecture. Everything that follows in Roku’s story stems from that.

How ScaleOps Automates Kubernetes Resource Optimization

ScaleOps installs via Helm and runs in read-only mode by default. From the first minute, it observes workload behavior and generates optimization opportunities — without touching a single pod. Teams review the data, build confidence, and enable automation when ready.

When automation is active, multiple mechanisms work together using native Kubernetes primitives. First, an admission controller patches pods with rightsized CPU and memory values at creation time. Second, smart pod placement clusters pods together on fewer nodes based on real-time demand rather than inflated requests. Third, as pods consolidate, nodes empty naturally — and your existing Cluster Autoscaler or Karpenter removes them. That’s where actual cloud cost reduction happens: not from smaller pods, but from fewer nodes.

ScaleOps also coordinates vertical rightsizing with horizontal replica optimization so the two compound rather than conflict — the opposite of the native Vertical Pod Autoscaler and Horizontal Pod Autoscaler death spiral. And when workloads experience sustained spikes or OOM events, ScaleOps reacts in real time rather than repeating failed recommendations in a loop.

→ Watch: how pod rightsizing and node consolidation work together (20:15)

Roku Before ScaleOps: Manual Kubernetes Resource Management Across 110+ Clusters

Roku operates over 110 Kubernetes clusters. Each engineering team is empowered to make independent infrastructure decisions — including how they size workloads. In practice, this meant every team sized for worst-case production: highest peak traffic, maximum anticipated load. There was no organization-wide Kubernetes resource sizing policy.

How often were teams revisiting those numbers? Once or twice a year at most, and typically only when triggered by an incident or a known change in load — a new channel, a viewer spike, a batch processing shift. The rest of the time, it was set-and-forget.

Roku’s Cloud Center of Excellence maintains a boilerplate Kubernetes install — Terraform and CloudFormation templates pre-configured with Prometheus, Grafana, and Loki — that any engineering team can use to stand up a cluster. Horizontal pod autoscaling was already automated through the Horizontal Pod Autoscaler. Vertical scaling was entirely manual.

Dieter, who has managed cloud financial operations at Google, Netflix, and Intuit since 2013, recognized the gap. Other technologies existed for vertical optimization. The team needed to test whether they could work alongside the existing HPA setup without conflict.

→ Watch: Dieter explains Roku’s starting point (7:45)

The ScaleOps Rollout: From PoC to Automated Resource Optimization

Roku started with a proof of concept. The primary concern was whether vertical and horizontal optimization would create race conditions — a real risk with native Kubernetes tooling where the Vertical Pod Autoscaler and Horizontal Pod Autoscaler fight over the same metrics.

The team ran ScaleOps in read-only observation mode for 14 days, reviewing what the system would recommend before enabling any automation. The engineers responsible for the PoC noted the ease of installation — a drop-in Helm deploy that generated optimization data immediately without any risk.

The result surprised Dieter. Instead of the expected conflict between vertical rightsizing and horizontal replica optimization, the savings compounded. Roku observed 15–30% reduction from vertical optimization and an additional 15–30% from horizontal optimization, stacking to 30–60% combined depending on how over-provisioned the workload was before ScaleOps engaged. No race condition materialized.

Finance approval followed naturally. The PoC workload alone showed a 20% optimization lift — enough for a business case. As Dieter described it, once you put projected savings in front of a CFO, they can’t ignore it. The product funded itself through the savings it generated.

From there, the rollout was progressive: dev and test environments first, then staging and QA, then production. ScaleOps was added to Roku’s internal boilerplate Kubernetes install at a specific version number. Engineers upgrading their clusters to that version automatically received ScaleOps. Teams were notified by email. Some may not have noticed.

Roku also evaluated multiple vendors during the PoC. A key differentiator was node reclamation speed — how quickly empty nodes could be returned to the cloud provider after pod consolidation. ScaleOps freed nodes in 2–3 minutes. A competing solution took 50–55 minutes. With per-minute cloud billing, that gap compounds across hundreds of consolidation events per month.

→ Watch: Dieter on the PoC surprise and compounding savings (12:30)

Kubernetes Resource Optimization Results: 32% vCPU Reduction at Roku

At approximately 20% automation coverage across 110+ clusters, Roku measured a 32% allocatable vCPU reduction. The projected savings with full rollout reach 68%. Adding node-level optimization brings the projection to 72%, and replica optimization adds another 10% on top.

The harder metric to quantify is engineering impact — or rather, the lack of it. Day-to-day workflows at Roku didn’t change. No new manifests were required. Engineers continued working with standard Kubernetes tooling while ScaleOps operated beneath the surface. Dieter’s assessment: the product was ready before Roku was. More effort went into establishing an unoptimized baseline for cost comparison than into installing and running ScaleOps itself.

Production safety held throughout the rollout. No incidents. No OOM kill cascades. No engineers paged because of resource changes. The product was, as Dieter described it, designed to not cause outages — and that’s exactly what Roku experienced.

→ Watch: the full results breakdown (16:00)

Key Takeaways

Four lessons from Roku’s Kubernetes resource optimization journey:

Safety scales trust. The technical capability to rightsize workloads has existed for years. What was missing was the confidence that automation wouldn’t break production. ScaleOps’ guard rails — self-healing, real-time OOM reaction, configurable policies — are what made Roku’s team willing to flip the switch.

Vertical and horizontal optimization compound when coordinated. The native Vertical Pod Autoscaler and Horizontal Pod Autoscaler conflict. ScaleOps coordinates them, and Roku saw 15–30% savings from each layer stacking to 30–60% combined. That’s not either-or — it’s additive.

Start in read-only mode. ScaleOps installs in under five minutes via Helm. The platform immediately starts to generate the projected savings data you need to build a business case. Dieter was clear: once a CFO sees the numbers, they have to act.

Kubernetes resource management is a systems problem. Annual reviews and manual YAML edits don’t work at 110+ clusters. Continuous, automated Kubernetes resource optimization — safe enough to ship as a default in your platform’s boilerplate install — does.

→ Watch: Dieter’s advice for platform teams considering automation (37:15)

Watch the full webinar → Production-Safe Resource Management: How Roku Built Trust in Automation with ScaleOps (43 min)

Try ScaleOps now → Start your free trial. Five-minute Helm install. Immediate visibility into your optimization opportunities. No changes to your workloads. No risk.

Frequently Asked Questions

What is Kubernetes resource optimization?

Kubernetes resource optimization is the practice of continuously aligning CPU and memory allocation with actual workload demand — rightsizing pod requests and limits, consolidating workloads onto fewer nodes, and coordinating horizontal and vertical scaling to eliminate waste without compromising performance or stability. Watch how this plays out at enterprise scale in our Roku webinar →

How does ScaleOps optimize Kubernetes resources?

ScaleOps combines real-time pod rightsizing, smart pod placement, and node consolidation using native Kubernetes primitives. It installs via Helm, runs in read-only mode by default, and activates automation when teams are ready. See the full architecture walkthrough (20:15) →

How much did Roku save with ScaleOps?

Roku achieved a 32% allocatable vCPU reduction at approximately 20% automation coverage across 110+ clusters. With full rollout, the projected reduction reaches 68%, increasing to 72% with node optimization and an additional 10% with replica optimization. Hear Dieter walk through the numbers (16:00) →

What is Kubernetes rightsizing?

Kubernetes rightsizing is the process of adjusting pod CPU and memory requests and limits to match actual usage patterns. It eliminates over-provisioning (wasted spend) and under-provisioning (performance degradation and OOM kills). At scale, manual rightsizing is impractical — automated solutions like ScaleOps perform this continuously based on real-time workload behavior.

Does ScaleOps work with the Horizontal Pod Autoscaler?

Yes. Unlike the native Vertical Pod Autoscaler, which conflicts with the Horizontal Pod Autoscaler and causes scaling thrashing, ScaleOps coordinates vertical rightsizing and horizontal replica optimization so savings compound. Roku measured 15–30% savings from each, stacking to 30–60% combined. Watch Dieter explain the compounding effect (12:30) →

Is there a production-safe VPA alternative for Kubernetes?

ScaleOps is designed as a production-safe alternative to the Vertical Pod Autoscaler. It addresses VPA’s core limitations — uncontrolled pod restarts, histogram decay, lack of cluster context, and HPA conflicts — by enabling non-disruptive rightsizing, real-time burst reaction, and self-healing OOM recovery. Roku deployed it across 110+ clusters with zero production incidents. Watch how ScaleOps handles real-time adaptation (25:45) →

How long does ScaleOps take to install?

ScaleOps installs via Helm in under five minutes. It begins generating optimization opportunities in read-only mode immediately, without touching any workloads. Roku ran 14 days in observation mode during their proof of concept before enabling automation.

Is ScaleOps safe for production Kubernetes clusters?

ScaleOps is designed for production-first deployment. Roku rolled it out across 110+ clusters with zero production incidents and no engineering workflow changes. The product includes self-healing capabilities, real-time OOM reaction, and configurable safety policies per workload type. Watch the full story of Roku’s production rollout →