Why does my ECS service not scale out even though CPU is at 100%?

The most common reasons are that the scaling policy needs three consecutive minutes of high CPU before it acts, a cooldown period from a previous scaling action is blocking a new one, or the maximum task count is already reached. Check your Application Auto Scaling activity log for suppressed or failed scaling actions to confirm which one applies.

How do I reduce the time it takes for ECS to scale out during a traffic spike?

Lower the scale-out cooldown from the default 300 seconds to 60–90 seconds, drop your CPU target threshold to 55–60% so scaling starts before you're fully saturated, and use scheduled scaling actions to pre-warm capacity before predictable traffic spikes arrive.

What is the difference between ECS auto scaling and EC2 Auto Scaling Group scaling?

ECS service auto scaling adjusts the number of running tasks within your cluster, while EC2 Auto Scaling Group scaling adjusts the number of EC2 instances available to host those tasks. If you use EC2 capacity providers, both loops must succeed for a scale-out to complete — ECS can request more tasks but they'll stay in PENDING until EC2 instances are available.

Can I use both target tracking and step scaling policies on the same ECS service?

Yes, you can attach multiple scaling policies to the same ECS service and Application Auto Scaling will honor whichever policy recommends the highest desired count at any given time. A common pattern is to use target tracking for steady-state scaling and a step policy as an emergency supplement that fires aggressively when CPU crosses a very high threshold.

How do I tell if ECS auto scaling is being blocked by IAM permissions?

Run the describe-scaling-activities CLI command for your service namespace and filter for activities with a StatusCode of Failed. A failed activity with an access denied message points to a missing or misconfigured IAM service-linked role for Application Auto Scaling on ECS.

Fixing AWS ECS Auto Scaling Stalls During Traffic Bursts

Your ECS service is pinned at maximum CPU. Requests are queuing up, latency is climbing, and yet the task count hasn't budged from its minimum. Application Auto Scaling is supposed to handle exactly this — so why isn't it firing?

Stalled ECS auto scaling during traffic bursts is one of those failures that looks mysterious on the surface but almost always traces back to a small number of root causes: metric lag, cooldown misconfiguration, capacity provider bottlenecks, or a threshold set just high enough to never trigger in time. This guide walks through all of them with concrete diagnostic steps you can run right now.

What You'll Learn

How ECS auto scaling actually works, end to end, so you know where to look first
The five most common reasons scaling stalls during a burst and how to confirm each one
Specific CLI commands and console checks to diagnose your environment quickly
How to tune cooldowns and thresholds so scaling fires in time, not after the spike passes
Proactive patterns — scheduled scaling, step policies — to cover bursts that metric-based policies can't catch fast enough

What Actually Triggers ECS Auto Scaling

ECS doesn't scale itself. It delegates to Application Auto Scaling, which watches a CloudWatch metric and adjusts the service's desired task count when a policy condition is met. The most common setup is a target tracking policy — you pick a metric (usually ECSServiceAverageCPUUtilization or ECSServiceAverageMemoryUtilization) and a target value, and the policy tries to keep the metric near that target by adding or removing tasks.

The chain looks like this: real traffic hits your tasks → CPU or memory rises → CloudWatch publishes a new data point → Application Auto Scaling evaluates the metric → it updates the ECS service's desired count → ECS launches new tasks → those tasks register with your load balancer. Every link in that chain has its own timing and failure modes.

Why Auto Scaling Stalls During Traffic Bursts

Bursts are the worst case for reactive scaling because every step in the chain takes time, and bursts are by definition short. A spike that lasts two minutes may never produce enough sustained metric data for a scale-out to complete before traffic drops again. But before you blame the inherent latency of the system, rule out the fixable problems first — the ones that make an already-slow loop even slower.

Check 1: CloudWatch Metric Lag Is Hiding the Spike

ECS publishes service-level CPU and memory metrics to CloudWatch at a one-minute granularity. Application Auto Scaling then evaluates those metrics — by default, it needs to see the threshold breached for three consecutive data points (three minutes) before acting on a scale-out. If your burst lasts less than three to four minutes, the scaling policy may never trip.

Check the raw metric first. In the CloudWatch console, go to ECS → ClusterName → ServiceName and look at CPUUtilization over a narrow time window around the incident. If you see a sharp spike followed by a return to baseline, and the spike lasted fewer than three minutes, the policy never had enough data points to act.

aws cloudwatch get-metric-statistics \
  --namespace AWS/ECS \
  --metric-name CPUUtilization \
  --dimensions Name=ClusterName,Value=my-cluster Name=ServiceName,Value=my-service \
  --start-time 2024-06-10T14:00:00Z \
  --end-time 2024-06-10T14:30:00Z \
  --period 60 \
  --statistics Average

If you're seeing the spike clearly in the raw data but the task count never moved, the problem is more likely cooldown or capacity provider related — keep reading. If the spike is barely visible because it was very short, you'll need scheduled scaling or step policies to handle it (covered below).

It's also worth checking whether your CloudWatch alarms are stuck in INSUFFICIENT_DATA, which can prevent scaling actions from evaluating correctly even when metric data exists.

Check 2: Cooldown Periods Are Blocking the Scale-Out

Target tracking policies have a scale-out cooldown and a scale-in cooldown. After a scaling action fires, the policy won't fire again until the cooldown expires. The default scale-out cooldown is 300 seconds (five minutes). If your burst produces two waves of traffic within five minutes, only the first wave triggers a scale-out; the second wave hits while you're still in cooldown.

Fetch your current policy configuration to confirm:

aws application-autoscaling describe-scaling-policies \
  --service-namespace ecs \
  --query 'ScalingPolicies[?ResourceId==`service/my-cluster/my-service`]'

Look at TargetTrackingScalingPolicyConfiguration.ScaleOutCooldown. If it's at 300 seconds and your traffic pattern shows repeated bursts, that cooldown is too long. For most services that can safely absorb a few extra tasks, dropping scale-out cooldown to 60–90 seconds is reasonable. Scale-in cooldown can stay longer (300–600 seconds) to prevent thrashing.

You can also check the scaling activity history to see whether actions were suppressed:

aws application-autoscaling describe-scaling-activities \
  --service-namespace ecs \
  --resource-id service/my-cluster/my-service

Activities with a StatusCode of Successful and a description mentioning "cooldown" mean the policy evaluated and chose not to act. That's your confirmation.

Check 3: Capacity Provider Managed Scaling Can't Keep Up

If your ECS cluster uses EC2 capacity providers with managed scaling enabled, there's a second scaling loop running in parallel: as ECS tries to place new tasks, it may find no EC2 instances with available capacity. The capacity provider then requests more instances from an Auto Scaling Group, which takes time to provision and warm up — often two to four minutes for an instance to be healthy and ready for task placement.

This means even if Application Auto Scaling correctly increases the ECS desired count, the tasks sit in PENDING state waiting for EC2 capacity. From the outside it looks like ECS auto scaling stalled, but the ECS side is working fine — EC2 provisioning is the bottleneck.

Check for pending tasks with no host:

aws ecs list-tasks \
  --cluster my-cluster \
  --service-name my-service \
  --desired-status PENDING

If you see tasks stuck in PENDING for more than a minute or two, describe one of them to look at the stopped reason and placement constraints:

aws ecs describe-tasks \
  --cluster my-cluster \
  --tasks <task-arn>

The fix here is one of three approaches: switch to Fargate (which eliminates the EC2 provisioning lag entirely), maintain a buffer of warm EC2 capacity using the capacity provider's targetCapacity setting below 100%, or use a mix of On-Demand and Spot with higher base capacity reserved. If you're on Fargate and still seeing PENDING tasks, the issue is Fargate vCPU quota limits in your region — check Service Quotas in the console.

Check 4: The Scaling Policy Threshold Is Too Conservative

A target tracking policy set to scale at 70% CPU sounds reasonable, but if your traffic bursts cause CPU to jump from 30% to 95% in under a minute, the 70% threshold is breached only briefly. Combined with the three-data-point evaluation window, the policy may not trigger a full scale-out before CPU drops again as requests time out or clients back off.

A more useful threshold for burst-heavy services is 50–60% CPU. This gives the policy headroom to start scaling before you're already saturated. Yes, you'll run slightly more tasks at steady state, but the cost difference is usually small compared to the cost of a degraded service during a burst.

Update the policy via the console or CLI:

aws application-autoscaling put-scaling-policy \
  --service-namespace ecs \
  --resource-id service/my-cluster/my-service \
  --scalable-dimension ecs:service:DesiredCount \
  --policy-name cpu-target-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 55.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
    },
    "ScaleOutCooldown": 90,
    "ScaleInCooldown": 300
  }'

Check 5: IAM Permissions Are Silently Blocking the Action

Application Auto Scaling needs an IAM service-linked role (AWSServiceRoleForApplicationAutoScaling_ECSService) to update your ECS service's desired count. In most accounts this role is created automatically, but in accounts with restrictive SCPs or customized IAM configurations, it may be missing or have restricted trust policies.

Check whether the scaling activities are returning permission errors:

aws application-autoscaling describe-scaling-activities \
  --service-namespace ecs \
  --resource-id service/my-cluster/my-service \
  --query 'ScalingActivities[?StatusCode==`Failed`]'

A StatusCode of Failed with a message mentioning access denied or the IAM role is a clear signal. You can also verify the service-linked role exists:

aws iam get-role \
  --role-name AWSServiceRoleForApplicationAutoScaling_ECSService

If it's missing, create it by registering any scalable target — the role is created automatically at that point. You can't create service-linked roles manually through the standard IAM flow.

Fixing Cooldowns and Thresholds in Practice

Once you've identified which check failed, the fix is usually a combination of cooldown reduction and threshold adjustment. Here's a practical baseline for a web-facing ECS service that sees spiky traffic:

Setting	Default	Recommended for Burst Traffic
CPU target threshold	75%	50–60%
Scale-out cooldown	300s	60–90s
Scale-in cooldown	300s	300–600s
Minimum task count	1	2+ (avoid cold-start bottleneck)
Maximum task count	varies	Set a real ceiling — don't leave at 1

Also check that your maximum task count is actually reachable. It's embarrassingly common to see a service configured with min=2, max=2, leaving no room to scale at all. The desired count cannot exceed the maximum, and Application Auto Scaling will silently stop at that ceiling without surfacing an error.

If flapping ALB health checks are causing tasks to be replaced during a scaling event, that can make the situation significantly worse — terminating healthy tasks just as you need more of them. That failure mode is covered in detail in diagnosing flapping ALB health checks that kill healthy ECS tasks.

Proactive Patterns to Survive Future Bursts

Reactive scaling will always have latency. If your traffic pattern is predictable — a marketing email goes out at 10 AM, a nightly batch job starts at midnight — pair your target tracking policy with scheduled scaling actions that pre-scale the service before the burst arrives.

aws application-autoscaling put-scheduled-action \
  --service-namespace ecs \
  --resource-id service/my-cluster/my-service \
  --scalable-dimension ecs:service:DesiredCount \
  --scheduled-action-name pre-scale-morning-traffic \
  --schedule "cron(45 9 * * ? *)" \
  --scalable-target-action MinCapacity=10,MaxCapacity=50

This bumps the minimum to 10 tasks at 9:45 AM, fifteen minutes before the expected spike. After the burst, a second scheduled action resets the minimum back to baseline. Scheduled actions override the minimum and maximum bounds of your target tracking policy — they don't conflict with it.

For unpredictable bursts, consider adding a step scaling policy as a companion to your target tracking policy. Step scaling can fire on a simple CloudWatch alarm and doesn't share the three-data-point evaluation requirement. You can configure it to add a large batch of tasks (say, 30% of current count) the moment CPU crosses 80%, acting as an emergency supplement when the target tracking policy is too slow.

If your tasks themselves have slow startup times or depend on secrets fetched at boot, that startup latency directly reduces how much headroom pre-scaling buys you. A related problem worth auditing is covered in how Secrets Manager timeouts can block container startup — fixing that can take several seconds off your task startup time.

Common Pitfalls

Scaling on memory alone for CPU-bound workloads. Memory utilization often stays flat while CPU saturates. If your workload is CPU-bound, use CPU as the primary scaling metric, not memory.
Not accounting for task startup time in your threshold. If a task takes 60 seconds to be healthy, you need to start scaling while you still have capacity headroom — not after you've already saturated.
Forgetting the ALB deregistration delay. Scaling in removes tasks, but the ALB holds connections open during the deregistration delay (default 300 seconds). Set this to something reasonable for your service (30–60 seconds) to avoid slow scale-in.
Using a single AZ for task placement. If your tasks are all in one availability zone and that AZ has a brief capacity constraint, scaling will stall. Spread tasks across multiple AZs with a spread or binpack placement strategy.
Not testing the scaling path. Run a load test that deliberately triggers your scaling policy before you need it in production. Tools like k6 or hey can simulate a burst in a staging environment so you see exactly what happens end to end.

Wrapping Up: Next Steps

Stalled ECS auto scaling during traffic bursts is almost never one problem — it's a combination of metric lag, cooldown settings, capacity constraints, and threshold choices that compound each other. Work through the checks in order: confirm the metric data is there, check cooldown suppression in the scaling activity log, look for PENDING tasks waiting on EC2 capacity, and verify your thresholds give the policy enough runway to act before you're saturated.

Concrete next steps to take this week:

Run the describe-scaling-activities command against your production ECS service and look for any Failed or cooldown-suppressed actions from the last seven days.
Reduce scale-out cooldown to 90 seconds and lower your CPU target threshold to 55–60% if your service can handle a few extra tasks at steady state.
Add a scheduled scaling action for any predictable traffic pattern you know about — even a rough pre-scale 15 minutes early helps significantly.
Set your ALB deregistration delay to 30–60 seconds to avoid slow scale-in blocking future scale-out cycles.
Run a load test against a staging environment that validates the full scaling path: from metric spike to healthy new tasks registered with the load balancer.

Fixing AWS ECS Service Auto Scaling That Stalls During Traffic Bursts

What You'll Learn

What Actually Triggers ECS Auto Scaling

Why Auto Scaling Stalls During Traffic Bursts

Check 1: CloudWatch Metric Lag Is Hiding the Spike

Check 2: Cooldown Periods Are Blocking the Scale-Out

Check 3: Capacity Provider Managed Scaling Can't Keep Up

Check 4: The Scaling Policy Threshold Is Too Conservative

Check 5: IAM Permissions Are Silently Blocking the Action

Fixing Cooldowns and Thresholds in Practice

Proactive Patterns to Survive Future Bursts

Common Pitfalls

Wrapping Up: Next Steps

Frequently Asked Questions

Related Articles

Fixing Terraform State Lock Stuck in DynamoDB After a Failed Apply

Fixing AWS CloudWatch Alarms Stuck in INSUFFICIENT_DATA After Deployment

Debugging AWS Lambda Cold Starts Spiking Latency Behind API Gateway

Comments (0)

Leave a Comment

Fixing AWS ECS Service Auto Scaling That Stalls During Traffic Bursts

What You'll Learn

What Actually Triggers ECS Auto Scaling

Why Auto Scaling Stalls During Traffic Bursts

Check 1: CloudWatch Metric Lag Is Hiding the Spike

Check 2: Cooldown Periods Are Blocking the Scale-Out

Check 3: Capacity Provider Managed Scaling Can't Keep Up

Check 4: The Scaling Policy Threshold Is Too Conservative

Check 5: IAM Permissions Are Silently Blocking the Action

Fixing Cooldowns and Thresholds in Practice

Proactive Patterns to Survive Future Bursts

Common Pitfalls

Wrapping Up: Next Steps

Frequently Asked Questions

Related Articles

Fixing Terraform State Lock Stuck in DynamoDB After a Failed Apply

Fixing AWS CloudWatch Alarms Stuck in INSUFFICIENT_DATA After Deployment

Debugging AWS Lambda Cold Starts Spiking Latency Behind API Gateway

Comments (0)

Leave a Comment

Stay ahead of the curve