Fixing AWS ElastiCache Redis Evictions That Silently Degrade App Performance

June 09, 2026 9 min read 53 views
Abstract illustration of a server node with geometric key shapes disappearing into a dark gradient background, representing Redis cache evictions

Your app seems fine on the surface β€” response times are a bit elevated, database load is creeping up, but nothing is throwing an error. No alarms are firing. Then you check your ElastiCache metrics and notice CacheEvictions ticking upward. That number is the silent culprit. Redis is quietly discarding data to make room for new writes, and every evicted key is a cache miss your database has to absorb.

This guide walks you through finding the problem, understanding why it's happening, and making targeted changes that actually fix it.

What You'll Learn

  • How to identify evictions in ElastiCache CloudWatch metrics
  • What Redis eviction policies mean and which one you're probably using wrong
  • How to inspect memory usage at the key level
  • Configuration changes that reduce or eliminate unwanted evictions
  • When to scale up versus when to fix the data model

Prerequisites

You need an AWS account with an active ElastiCache Redis cluster (version 6.x or later works fine for everything here). You should be comfortable with the AWS Console and have basic Redis CLI knowledge. For key-level inspection, you'll need either a bastion host or VPC access to your Redis endpoint.

Understanding What Eviction Actually Means

Redis stores everything in memory. When that memory fills up and a new write arrives, Redis has to do something β€” and what it does depends on the maxmemory-policy setting. If that policy allows eviction, Redis picks one or more existing keys to delete and makes room for the new data.

The key thing to understand is that eviction is not an error. Redis does not log a warning visible to your application. Your SET or GET calls succeed. The only signal is a rising CacheEvictions metric and a falling CacheHitRate. That's why this problem often goes unnoticed until database latency becomes obvious.

ElastiCache Redis clusters have a maxmemory limit set automatically based on the node type. You cannot configure it freely like you can on a self-managed Redis server. This makes the eviction behavior more predictable but also means you have less room to maneuver without changing node size or architecture.

Finding Evictions in CloudWatch

Start in the AWS Console. Navigate to ElastiCache > Redis clusters, select your cluster, and open the Metrics tab. The two metrics to watch together are CacheEvictions and CacheHitRate.

CacheEvictions is a count per period β€” it tells you how many keys were evicted. CacheHitRate is a ratio β€” ideally above 0.95 for most workloads. If you see evictions climbing while hit rate falls, that's confirmation you have a problem, not just a coincidence.

Set up a CloudWatch alarm on CacheEvictions with a threshold that makes sense for your workload. A reasonable starting point is alerting when the 5-minute sum exceeds zero for a sustained period. Some low-level eviction is normal under burst traffic, but a steady baseline of evictions means your cluster is chronically undersized or your data is growing in ways you haven't accounted for.

Also check DatabaseMemoryUsagePercentage. If this is consistently above 80–85%, you are operating close to the limit and evictions will happen whenever traffic spikes.

Checking Your Eviction Policy

Connect to your Redis instance from a host inside your VPC:

redis-cli -h your-cluster-endpoint -p 6379

Then run:

CONFIG GET maxmemory-policy

You'll see one of these values:

PolicyBehavior
noevictionRejects writes when memory is full. Returns an error.
allkeys-lruEvicts the least recently used key from all keys.
volatile-lruEvicts the least recently used key that has a TTL set.
allkeys-lfuEvicts the least frequently used key from all keys.
volatile-lfuEvicts the least frequently used key that has a TTL set.
volatile-ttlEvicts the key with the shortest remaining TTL.
allkeys-randomEvicts a random key from all keys.
volatile-randomEvicts a random key that has a TTL set.

The default on many clusters is noeviction, which means Redis will return an error to your application when memory is full. If you're not seeing application errors but you are seeing evictions, your cluster was likely configured with allkeys-lru or volatile-lru at some point. You change this via an ElastiCache Parameter Group, not directly in the CLI β€” CLI changes to CONFIG do not persist on managed Redis.

Identifying Which Keys Are Being Evicted

Knowing that evictions are happening is step one. Knowing what is being evicted tells you whether the evictions are harmless or destructive.

Redis provides a keyspace notification feature that you can use to observe evictions in real time. First, enable it via your Parameter Group by setting notify-keyspace-events to Ex (E = keyspace events, x = evicted events). Then subscribe to the eviction channel:

redis-cli -h your-cluster-endpoint -p 6379 SUBSCRIBE __keyevent@0__:evicted

Leave this running during a traffic period. You'll see the key names as they are evicted. Look for patterns: are these session keys? Computed results? Reference data that's expensive to regenerate? That tells you how much each eviction is costing you.

For a broader memory snapshot, run the MEMORY USAGE command on specific keys you suspect are large:

MEMORY USAGE your:key:name

This returns the number of bytes that key consumes, including overhead. If you find keys in the hundreds of kilobytes or larger, those are candidates for restructuring.

To get a rough picture of key distribution without scanning the full keyspace (which you should never do with KEYS * on a production cluster), use SCAN with a MATCH pattern and COUNT hint:

SCAN 0 MATCH session:* COUNT 100

Repeat with the cursor value returned until you've iterated through the keyspace. This is non-blocking and safe for production use.

Common Causes and How to Fix Them

Keys Without TTLs

This is the most common root cause. If your application is writing keys without an expiry, they accumulate indefinitely. Redis has no way to age them out unless the eviction policy targets all keys (not just volatile ones).

Audit your application code for every SET call that doesn't include an EX or PX argument. Add a sensible TTL based on how stale the data can be before it becomes wrong. For session data, 30 minutes to a few hours is typical. For computed results that change with underlying data, align the TTL to your data update frequency.

# Bad: no expiry
SET user:profile:1234 "{...}"

# Good: expires after 30 minutes
SET user:profile:1234 "{...}" EX 1800

Oversized Values

Storing large objects in Redis β€” serialized lists, full HTML fragments, binary blobs β€” consumes memory fast. A single key holding 500KB means 2,000 such keys fills 1GB. If you're on a cache.t3.medium with roughly 3GB of usable memory, that's a meaningful chunk.

Consider compressing values before storing them, splitting large objects into smaller keyed parts, or moving large blobs to S3 and caching only the metadata or a reference in Redis.

Unbounded Key Growth

Watch for patterns like event:log:<timestamp> or request:trace:<uuid> where new keys are created for every event and nothing cleans them up. These patterns make memory grow linearly with traffic. Either set TTLs on these keys or reconsider whether Redis is the right store for this data β€” a time-series database or a log aggregator is a better fit for append-heavy workloads.

Wrong Eviction Policy for Your Access Pattern

If you have a mix of critical keys (session data, feature flags) and non-critical keys (rate-limit counters, analytics scratch space), using allkeys-lru means Redis might evict your critical session data to make room for analytics noise. Switch to volatile-lru and ensure only the non-critical keys have TTLs set. That way Redis will only evict the disposable keys first.

Adjusting ElastiCache Parameter Groups

You cannot change Redis configuration parameters directly on a managed ElastiCache cluster at runtime in a way that persists. All persistent configuration goes through a Parameter Group.

To change the eviction policy:

  1. In the AWS Console, go to ElastiCache > Parameter Groups and create a new parameter group based on the Redis family matching your cluster version.
  2. Edit the new parameter group and set maxmemory-policy to your chosen policy.
  3. Go to your cluster, click Modify, and assign the new parameter group.
  4. Apply the change immediately or during the next maintenance window depending on your tolerance for a brief connection interruption.

Changes to notify-keyspace-events follow the same process. Set it to Ex in the parameter group if you want persistent eviction notifications.

Scaling Up vs. Fixing the Data Model

If you've set TTLs, trimmed large values, and chosen the right eviction policy, but memory usage is still consistently above 80%, it's time to think about scaling. On ElastiCache, you have two options: scale up (larger node type) or scale out (add read replicas with cluster mode enabled).

Scaling up increases the memory available per shard. If your cluster is a single-shard setup, this is the simplest path. Moving from a cache.t3.medium to a cache.r7g.large roughly quadruples available memory and improves throughput significantly.

Scaling out with cluster mode enabled distributes your keyspace across multiple shards. This is the right choice if your workload is write-heavy or if a single node is also hitting CPU limits, not just memory. Be aware that cluster mode requires your client library to support Redis Cluster topology β€” verify this before enabling it.

Before you throw money at the problem by scaling up, always check whether the data model changes described above would reduce memory consumption enough. A 40% reduction in stored data size is often achievable just by adding TTLs and trimming values, and it costs nothing.

Common Pitfalls to Avoid

  • Running KEYS * on production. This blocks Redis while it scans the full keyspace. Use SCAN instead, always.
  • Assuming no errors means no evictions. Evictions are silent. Set up the CloudWatch alarm and don't rely on application-level error rates to catch this.
  • Setting TTLs too short. Overly aggressive TTLs cause cache churn β€” keys expire before they're used, and you're back to high miss rates. Tune TTLs based on actual read frequency, not just a conservative guess.
  • Mixing cache and persistent data in the same cluster. If you're storing data in Redis that you cannot afford to lose (queues, locks, durable state), and that cluster also caches ephemeral data, eviction of the wrong key becomes a correctness bug, not just a performance issue. Separate these concerns into distinct clusters.
  • Ignoring replication lag under memory pressure. When a primary node is under heavy eviction load, replication to replicas can lag. Read replicas may serve stale or missing data. Check the ReplicationLag metric alongside eviction metrics.

Next Steps

Here are the concrete actions to take after reading this:

  1. Set up a CloudWatch alarm on CacheEvictions today. Even a simple notification when the 5-minute sum exceeds a low threshold will surface problems before they compound.
  2. Audit your codebase for Redis writes without TTLs. Search for SET, HSET, and equivalent calls in your application and confirm each one has an expiry strategy.
  3. Enable keyspace notifications in a staging environment and observe what gets evicted under realistic load. This takes 20 minutes to set up and gives you a clear picture of eviction patterns.
  4. Review your eviction policy in the Parameter Group and verify it matches your data access patterns. If you have a mix of critical and non-critical keys, volatile-lru with TTLs on non-critical keys is a strong default.
  5. Check DatabaseMemoryUsagePercentage trends over the past 30 days. If it's trending upward with no sign of leveling off, start evaluating node size options before you hit the ceiling.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.