TCP #60: 7 EKS Scaling Strategies That Actually Work in Production

Not just Kubernetes theory

Amrut Patil

May 04, 2025

You can also read my newsletters from the Substack mobile app and be notified when a new issue is available.

Available for iOS and Android

I offer many free resources. If you haven't already done so, check out my store at Gumroad.

Visit Gumroad

Scaling an EKS cluster isn’t just about autoscaling your pods.

It’s about orchestrating pods, nodes, events, traffic patterns, and cost controls together, and getting it wrong means you’ll pay for it in downtime, latency, or wasted spend.

In today’s issue, I’ll break down 7 proven scaling strategies for EKS.

For each, I’ll cover:

What it solves
How to implement it
Common pitfalls to avoid
When to combine it with other strategies

This isn’t Kubernetes theory. It’s production-grade tactics for modern workloads.

1. Horizontal Pod Autoscaler (HPA): The foundation for dynamic app scaling

What it does:

Scales the number of pod replicas for a Deployment or StatefulSet based on CPU, memory, or custom metrics.

How to use:

Define a HorizontalPodAutoscaler that targets your deployment:

spec:
  scaleTargetRef:
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        targetAverageUtilization: 70

Gotchas:

HPA won't scale if nodes can't accommodate new pods. It's reactive to pods — not infra.
Default metrics are limited. Use custom metrics (Prometheus + Adapter) for business KPIs or queue depth.

Pro Tip:

Always pair HPA with Cluster Autoscaler or Karpenter to ensure your infrastructure scales.

2. Cluster Autoscaler (CA): Scale your nodes based on real pod demand

What it does:

Automatically adds or removes nodes from an Auto Scaling Group (ASG) when pods are pending or underutilized.

How to use:

Deploy the official Cluster Autoscaler Helm chart
Set tags on your ASGs:
- k8s.io/cluster-autoscaler/enabled
- k8s.io/cluster-autoscaler/<cluster-name>
Enable IAM permissions for scaling actions.

Gotchas:

If pod resource requests are too high, CA may fail to find a suitable node.
Taints, labels, or node selectors that mismatch your pod spec = pending pods forever.
Scaling is slower (~1–2 min), especially with EC2 cold starts.

Pro Tip:

Use expander=least-waste or priority to influence how nodes get picked during scale-out.

3. Karpenter: The next-gen node autoscaler built for flexibility and speed

What it does:

Provisions right-sized EC2 instances based on pod specs, across instance types, zones, and architectures — without needing ASGs.

How to use:

Install Karpenter controller
Create a Provisioner YAML with constraints (e.g., instance types, zones, architectures)
Define TTL for empty nodes and consolidation logic

Gotchas:

Consolidation misfires can lead to node churn. Tune aggressively if you see frequent pod evictions.
IAM roles must be attached to karpenter-controller with proper EC2 permissions.
Needs thoughtful provisioning logic — don’t just let it pick anything.

Pro Tip:

Use capacityType: spot with consolidationPolicy: WhenUnderutilized to minimize cost while keeping responsiveness high.

4. Vertical Pod Autoscaler (VPA): Right-size resource requests to cut waste and avoid OOMs

What it does:

Adjusts pod CPU and memory requests/limits based on usage history.

How to use:

Deploy VPA via Helm
Create a VerticalPodAutoscaler Object with:

updatePolicy:
  updateMode: "Auto" # or "Initial" or "Off"

Gotchas:

VPA restarts pods to apply new resource values — this can cause churn or downtime.
HPA and VPA conflict if both try to control CPU/mem—use recommendationOnly for HPA workloads.
Doesn’t work well with short-lived pods or jobs.

Pro Tip:

First, use VPA in “Off” or “Initial” mode to generate recommendations, then apply them after review.

5. KEDA (Kubernetes Event-Driven Autoscaling): React to external signals, not just CPU

What it does:

Scales pods are based on external event sources like SQS, Kafka, Prometheus, Redis, etc.

How to use:

Install KEDA via Helm
Deploy a ScaledObject that references your Deployment and metric source

Gotchas:

Too many events = rapid scale-out = node exhaustion if not paired with Cluster Autoscaler/Karpenter
Watch for scale-to-zero delay — KEDA can be aggressive or slow depending on your config.

Pro Tip:

Use KEDA for queue-based processing, cron-triggered jobs, or bursty workloads where the CPU is a poor signal.

6. Scheduled Scaling: Scale proactively based on time patterns

What it does:

Triggers scale actions on a cron-like schedule, regardless of real-time metrics.

How to use:

Create an ASG scheduled action in AWS
Or use Kubernetes-native tools (e.g., keda-scaler, cronjobs, custom controllers)

Gotchas:

No built-in reactivity — great for predictable traffic, bad for surprise spikes
Must test manually before real traffic hits

Pro Tip:

Use Scheduled + HPA together: Scheduled scaling sets the floor/ceiling, and HPA fine-tunes it dynamically.

7. Overprovisioning with Pause Pods: Ensure instant availability during unpredictable bursts

What it does:

Uses low-priority “pause” pods to pre-reserve resources gets evicted when real workloads come in.

How to use:

Deploy pause pods with low priorityClass
Request small CPU/memory
Use Cluster Autoscaler with scale-down-utilization-threshold < 1.0

Gotchas:

If pause pods aren't evictable, they’ll waste capacity
Forgetting preemptionPolicy: PreemptLowerPriority will block scaling under load

Pro Tip:

Use with HPA for high-traffic APIs where cold starts aren’t acceptable. Pause pods are instantly evicted to make room.

Combine to Win

You should never use just one scaler. Combine based on your workload needs:

HPA + Karpenter: Modern stateless apps with bursty traffic
KEDA + Cluster Autoscaler: Queue-heavy, event-driven workloads
Scheduled + Overprovisioning: Predictable morning spikes + buffer room
VPA + Karpenter: Long-running apps with uncertain resource needs

Architecting scaling is about composition, not just configuration.

Troubleshooting Checklist

Pods pending? → Check node taints, pod affinity, or unmatchable instance types
HPA not working? → Validate metrics-server and ensure resource requests are set
Karpenter over-scaling? → Tune consolidationPolicy and TTLs
VPA breaking HPA? → Disable auto mode and review recommendations first
Budget blowout? → Spot + consolidation + right-sizing = your best friend

Thanks for reading The Cloud Playbook! This post is public so feel free to share it.

Final Thoughts

Scaling EKS is a multi-layered challenge.

But when done right, it enables:

Smooth traffic bursts
Minimal latency
Predictable cost ceilings
Confident deploys

The best engineering teams don’t rely on one scaler. They build composable scaling architectures that adapt to workload behavior.

Now you can, too.

SPONSOR US

The Cloud Playbook is now offering sponsorship slots in each issue. If you want to feature your product or service in my newsletter, explore my sponsor page

Become a Proud Sponsor!

That’s it for today!

Did you enjoy this newsletter issue?

Share with your friends, colleagues, and your favorite social media platform.

Share The Cloud Playbook

Until next week — Amrut

Get in touch

You can find me on LinkedIn or X.

If you wish to request a topic you would like to read, you can contact me directly via LinkedIn or X.

The Cloud Playbook

TCP #60: 7 EKS Scaling Strategies That Actually Work in Production

Not just Kubernetes theory

1. Horizontal Pod Autoscaler (HPA): The foundation for dynamic app scaling

What it does:

How to use:

Gotchas:

Pro Tip:

2. Cluster Autoscaler (CA): Scale your nodes based on real pod demand

What it does:

How to use:

Gotchas:

Pro Tip:

3. Karpenter: The next-gen node autoscaler built for flexibility and speed

What it does:

How to use:

Gotchas:

Pro Tip:

4. Vertical Pod Autoscaler (VPA): Right-size resource requests to cut waste and avoid OOMs

What it does:

How to use:

Gotchas:

Pro Tip:

5. KEDA (Kubernetes Event-Driven Autoscaling): React to external signals, not just CPU

What it does:

How to use:

Gotchas:

Pro Tip:

6. Scheduled Scaling: Scale proactively based on time patterns

What it does:

How to use:

Gotchas:

Pro Tip:

7. Overprovisioning with Pause Pods: Ensure instant availability during unpredictable bursts

What it does:

How to use:

Gotchas:

Pro Tip:

Combine to Win

Troubleshooting Checklist

Final Thoughts

SPONSOR US

Get in touch

Discussion about this post