You can also read my newsletters from the Substack mobile app and be notified when a new issue is available.
I offer many free resources. If you haven't already done so, check out my store at Gumroad.
Scaling an EKS cluster isn’t just about autoscaling your pods.
It’s about orchestrating pods, nodes, events, traffic patterns, and cost controls together, and getting it wrong means you’ll pay for it in downtime, latency, or wasted spend.
In today’s issue, I’ll break down 7 proven scaling strategies for EKS.
For each, I’ll cover:
What it solves
How to implement it
Common pitfalls to avoid
When to combine it with other strategies
This isn’t Kubernetes theory. It’s production-grade tactics for modern workloads.

1. Horizontal Pod Autoscaler (HPA): The foundation for dynamic app scaling
What it does:
Scales the number of pod replicas for a Deployment or StatefulSet based on CPU, memory, or custom metrics.
How to use:
Define a HorizontalPodAutoscaler
that targets your deployment:
spec:
scaleTargetRef:
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 70
Gotchas:
HPA won't scale if nodes can't accommodate new pods. It's reactive to pods — not infra.
Default metrics are limited. Use custom metrics (Prometheus + Adapter) for business KPIs or queue depth.
Pro Tip:
Always pair HPA with Cluster Autoscaler or Karpenter to ensure your infrastructure scales.
2. Cluster Autoscaler (CA): Scale your nodes based on real pod demand
What it does:
Automatically adds or removes nodes from an Auto Scaling Group (ASG) when pods are pending or underutilized.
How to use:
Deploy the official Cluster Autoscaler Helm chart
Set tags on your ASGs:
k8s.io/cluster-autoscaler/enabled
k8s.io/cluster-autoscaler/<cluster-name>
Enable IAM permissions for scaling actions.
Gotchas:
If pod resource requests are too high, CA may fail to find a suitable node.
Taints, labels, or node selectors that mismatch your pod spec = pending pods forever.
Scaling is slower (~1–2 min), especially with EC2 cold starts.
Pro Tip:
Use expander=least-waste
or priority
to influence how nodes get picked during scale-out.
3. Karpenter: The next-gen node autoscaler built for flexibility and speed
What it does:
Provisions right-sized EC2 instances based on pod specs, across instance types, zones, and architectures — without needing ASGs.
How to use:
Install Karpenter controller
Create a
Provisioner
YAML with constraints (e.g., instance types, zones, architectures)Define TTL for empty nodes and consolidation logic
Gotchas:
Consolidation misfires can lead to node churn. Tune aggressively if you see frequent pod evictions.
IAM roles must be attached to
karpenter-controller
with proper EC2 permissions.Needs thoughtful provisioning logic — don’t just let it pick anything.
Pro Tip:
Use capacityType: spot
with consolidationPolicy: WhenUnderutilized
to minimize cost while keeping responsiveness high.
4. Vertical Pod Autoscaler (VPA): Right-size resource requests to cut waste and avoid OOMs
What it does:
Adjusts pod CPU and memory requests/limits based on usage history.
How to use:
Deploy VPA via Helm
Create a
VerticalPodAutoscaler
Object with:
updatePolicy:
updateMode: "Auto" # or "Initial" or "Off"
Gotchas:
VPA restarts pods to apply new resource values — this can cause churn or downtime.
HPA and VPA conflict if both try to control CPU/mem—use
recommendationOnly
for HPA workloads.Doesn’t work well with short-lived pods or jobs.
Pro Tip:
First, use VPA in “Off” or “Initial” mode to generate recommendations, then apply them after review.
5. KEDA (Kubernetes Event-Driven Autoscaling): React to external signals, not just CPU
What it does:
Scales pods are based on external event sources like SQS, Kafka, Prometheus, Redis, etc.
How to use:
Install KEDA via Helm
Deploy a
ScaledObject
that references your Deployment and metric source
Gotchas:
Too many events = rapid scale-out = node exhaustion if not paired with Cluster Autoscaler/Karpenter
Watch for scale-to-zero delay — KEDA can be aggressive or slow depending on your config.
Pro Tip:
Use KEDA for queue-based processing, cron-triggered jobs, or bursty workloads where the CPU is a poor signal.
6. Scheduled Scaling: Scale proactively based on time patterns
What it does:
Triggers scale actions on a cron-like schedule, regardless of real-time metrics.
How to use:
Create an ASG scheduled action in AWS
Or use Kubernetes-native tools (e.g.,
keda-scaler
,cronjobs
, custom controllers)
Gotchas:
No built-in reactivity — great for predictable traffic, bad for surprise spikes
Must test manually before real traffic hits
Pro Tip:
Use Scheduled + HPA together: Scheduled scaling sets the floor/ceiling, and HPA fine-tunes it dynamically.
7. Overprovisioning with Pause Pods: Ensure instant availability during unpredictable bursts
What it does:
Uses low-priority “pause” pods to pre-reserve resources gets evicted when real workloads come in.
How to use:
Deploy pause pods with low
priorityClass
Request small CPU/memory
Use Cluster Autoscaler with
scale-down-utilization-threshold
< 1.0
Gotchas:
If pause pods aren't evictable, they’ll waste capacity
Forgetting
preemptionPolicy: PreemptLowerPriority
will block scaling under load
Pro Tip:
Use with HPA for high-traffic APIs where cold starts aren’t acceptable. Pause pods are instantly evicted to make room.
Combine to Win
You should never use just one scaler. Combine based on your workload needs:
HPA + Karpenter: Modern stateless apps with bursty traffic
KEDA + Cluster Autoscaler: Queue-heavy, event-driven workloads
Scheduled + Overprovisioning: Predictable morning spikes + buffer room
VPA + Karpenter: Long-running apps with uncertain resource needs
Architecting scaling is about composition, not just configuration.
Troubleshooting Checklist
Pods pending? → Check node taints, pod affinity, or unmatchable instance types
HPA not working? → Validate metrics-server and ensure resource requests are set
Karpenter over-scaling? → Tune
consolidationPolicy
and TTLsVPA breaking HPA? → Disable auto mode and review recommendations first
Budget blowout? → Spot + consolidation + right-sizing = your best friend
Final Thoughts
Scaling EKS is a multi-layered challenge.
But when done right, it enables:
Smooth traffic bursts
Minimal latency
Predictable cost ceilings
Confident deploys
The best engineering teams don’t rely on one scaler. They build composable scaling architectures that adapt to workload behavior.
Now you can, too.
SPONSOR US
The Cloud Playbook is now offering sponsorship slots in each issue. If you want to feature your product or service in my newsletter, explore my sponsor page
That’s it for today!
Did you enjoy this newsletter issue?
Share with your friends, colleagues, and your favorite social media platform.
Until next week — Amrut
Get in touch
You can find me on LinkedIn or X.
If you wish to request a topic you would like to read, you can contact me directly via LinkedIn or X.