Diagnosing Runaway AWS Costs from S3 Request Charges Nobody Warned You About

May 14, 2026 8 min read 16 views
Minimalist illustration of a cloud connected to a rising cost chart and magnifying glass on a blue gradient background

You open your AWS bill and the number is wrong. Storage is roughly what you expected, but there's a line for S3 request charges that's three times larger than last month. You didn't upload more files. You didn't change your application. And yet, something is making millions of requests against your buckets.

S3 request pricing is one of the most misunderstood parts of AWS billing. Every GET, PUT, LIST, and DELETE costs money β€” and some operations cost ten times more than others. A single misconfigured application, a runaway crawler, or an innocent-looking Lambda function can generate enough requests overnight to turn a predictable $20 bill into a $600 surprise.

What you'll learn

  • How S3 request pricing actually works and which operations are most expensive
  • How to use AWS Cost Explorer and CloudWatch to isolate which buckets and request types are to blame
  • How to enable and read S3 server access logs to identify specific callers
  • Common architectural mistakes that silently generate millions of unnecessary requests
  • Concrete fixes to apply once you've found the culprit

How S3 Request Pricing Actually Works

S3 doesn't just charge for storage. Every API call against a bucket is a billable event. AWS groups these into two main tiers: PUT, COPY, POST, and LIST requests on one end, and GET, SELECT, and all other requests on the other. The PUT/LIST tier costs roughly ten times more per thousand requests than the GET tier β€” but GET requests are often what pile up at scale.

A few things that surprise people:

  • LIST operations on large buckets with thousands of objects are expensive and slow. Listing a bucket with a million objects isn't one request β€” it's many paginated calls, each billed individually.
  • HEAD requests count as GET-tier requests. Tools that check whether an object exists before downloading it are doubling their request count.
  • Replication, lifecycle transitions, and inventory all generate their own internal requests that show up on your bill.
  • S3 Select has its own request pricing on top of data scan charges.

None of this is hidden β€” it's in the AWS pricing page β€” but it's easy to forget when you're building features and not thinking about call volume.

Start with AWS Cost Explorer

Cost Explorer is your first stop. It won't tell you which specific application is making the requests, but it will tell you which buckets and which request types are costing you money.

Open Cost Explorer and filter by Service: S3. Then group by Usage Type. You'll see line items like USE1-Requests-Tier1 (PUT/LIST) and USE1-Requests-Tier2 (GET) broken out by region. If one of these has spiked, you know the category. If they've both spiked, something is doing a lot of read-write cycling.

Switch the grouping to Resource (which maps to bucket name) if you have Cost Allocation Tags or S3 bucket-level cost allocation enabled. If you don't have tagging set up, this view won't be useful β€” which is reason enough to add bucket tags today, before the next incident.

Enable S3 Server Access Logging

Cost Explorer tells you what. Server access logs tell you who. This is the most direct way to identify which IP addresses, IAM principals, or user agents are hammering your bucket.

To enable logging for a bucket, go to the bucket's Properties tab in the S3 console, scroll to Server access logging, and point it at a separate target bucket. You can also do this via CLI:

aws s3api put-bucket-logging \
  --bucket your-source-bucket \
  --bucket-logging-status '{
    "LoggingEnabled": {
      "TargetBucket": "your-log-bucket",
      "TargetPrefix": "s3-access-logs/your-source-bucket/"
    }
  }'

Logs are delivered on a best-effort basis and may lag by several minutes. They are not real-time. Once logs start arriving, each line contains the bucket name, request time, remote IP, requester ARN, operation type, object key, HTTP status, and bytes transferred.

A quick way to scan for the noisiest callers is to count by requester and operation using basic shell tools:

# Count requests by operation type
awk '{print $8}' s3-access-logs/your-source-bucket/* | sort | uniq -c | sort -rn | head -20

# Count requests by requester ARN
awk '{print $20}' s3-access-logs/your-source-bucket/* | sort | uniq -c | sort -rn | head -20

The column positions in S3 access logs are fixed, so $8 is the operation and $20 is the requester. If you're pulling a large volume of logs, consider copying them to Athena instead.

Query Logs at Scale with Athena

If you have days or weeks of logs to analyze, parsing them in bash gets unwieldy. Athena can query S3 access logs directly without loading them into a database first.

Create an Athena table pointing at your log prefix:

CREATE EXTERNAL TABLE s3_access_logs (
  bucketowner STRING,
  bucket STRING,
  requestdatetime STRING,
  remoteip STRING,
  requester STRING,
  requestid STRING,
  operation STRING,
  key STRING,
  requesturi STRING,
  httpstatus INT,
  errorcode STRING,
  bytessent BIGINT,
  objectsize BIGINT,
  totaltime INT,
  turnaroundtime INT,
  referrer STRING,
  useragent STRING,
  versionid STRING,
  hostid STRING,
  sigv STRING,
  ciphersuit STRING,
  authtype STRING,
  endpoint STRING,
  tlsversion STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  'input.regex' = '([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ("[^"]*"|-) (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ("[^"]*"|-) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*)'
)
LOCATION 's3://your-log-bucket/s3-access-logs/your-source-bucket/';

Then run targeted queries:

-- Top operations by count in the last 7 days
SELECT operation, COUNT(*) AS request_count
FROM s3_access_logs
WHERE requestdatetime LIKE '%Jul%2025%'
GROUP BY operation
ORDER BY request_count DESC
LIMIT 20;

-- Top requesters by volume
SELECT requester, COUNT(*) AS request_count
FROM s3_access_logs
GROUP BY requester
ORDER BY request_count DESC
LIMIT 20;

Athena charges per byte scanned, so partition your log table by date prefix if you plan to run these queries regularly.

Use CloudWatch Metrics for Real-Time Visibility

Server access logs give you history. CloudWatch gives you a live view β€” but only if you've enabled request metrics on the bucket, which is not on by default and costs extra (a small amount per metric per month).

To enable request metrics, go to the bucket's Metrics tab, choose Request metrics, and create a filter. You can scope it to all objects or a specific prefix. Once enabled, you'll see metrics like NumberOfObjects, AllRequests, GetRequests, PutRequests, ListRequests, and 4xxErrors in CloudWatch.

Set up a CloudWatch alarm on AllRequests with a threshold based on your normal traffic. When that alarm fires, you'll know within minutes that something unusual is happening β€” rather than finding out when the invoice arrives.

Common Culprits Worth Checking First

Once you have logs, here are the patterns that appear most often in runaway request charges:

Application code in a tight loop

A Lambda function, a background job, or a misconfigured retry policy can call S3 thousands of times per second. Look for a single requester ARN dominating your logs. If the ARN belongs to a Lambda execution role, check the function's code for loops that call GetObject or ListObjects inside a retry block without exponential backoff.

Listing instead of indexing

If your application needs to find objects by some attribute, listing the bucket every time is the wrong approach. Every ListObjectsV2 call scans a prefix and returns up to 1,000 keys per page. On a bucket with millions of objects, this is both expensive and slow. Store object metadata in DynamoDB or another index, and look up by key directly.

Missing caching in front of public assets

If your S3 bucket serves static assets directly (without CloudFront), every page load from every user hits S3 directly. A moderate traffic spike can generate millions of GET requests in hours. Put CloudFront in front of S3, configure appropriate cache TTLs, and your GET request count will drop by an order of magnitude.

Log delivery loops

If you accidentally configure your access log target bucket to be the same bucket you're logging, or if a Lambda function is triggered by new objects in the log bucket and writes back to S3, you can create a feedback loop that generates requests exponentially. Check your bucket's event notifications and confirm log target buckets are different from source buckets.

Third-party integrations polling aggressively

Security scanners, backup tools, and data pipeline connectors sometimes poll S3 buckets far more frequently than you'd expect. Check the user agent column in your access logs. If you see a tool name you recognize, look at its configuration and reduce its polling interval or switch it to event-driven triggering via S3 event notifications instead.

Versioning and lifecycle policy gaps

If you have versioning enabled but no lifecycle policy to expire old versions, your bucket accumulates versions indefinitely. Some tools list all versions of all objects (using ListObjectVersions) which is a Tier 1 (expensive) operation and can be very slow on large buckets. Add a lifecycle rule to expire noncurrent versions after a sensible number of days.

Common Pitfalls When Investigating

A few mistakes that slow down the diagnosis:

  • Assuming it's storage, not requests. The storage and request line items look similar on the bill. Read the usage type description carefully before concluding which one is the problem.
  • Enabling logging after the incident. Logs only capture requests going forward. If the spike already happened and you didn't have logging on, Cost Explorer and CloudWatch metrics are your only retrospective tools.
  • Forgetting cross-region request charges. Requests from one AWS region to an S3 bucket in another region incur data transfer costs on top of request costs. If your application is in us-east-1 and your bucket is in eu-west-1, you're paying twice.
  • Overlooking requester-pays buckets. If you're the bucket owner and have requester-pays disabled, all costs fall on you regardless of who makes the request. Public or semi-public buckets can be abused by external scrapers.

Wrapping Up

S3 request charges are fixable once you know what's causing them. Here's what to do right now:

  1. Open Cost Explorer, filter by S3, group by Usage Type, and identify whether the spike is in Tier 1 (PUT/LIST) or Tier 2 (GET) requests and which region it's in.
  2. Enable server access logging on any bucket that doesn't have it, pointed at a dedicated log bucket. Don't skip this β€” it's the only way to see individual request details.
  3. Enable CloudWatch request metrics on high-value buckets and set an alarm so you catch the next spike within minutes, not weeks.
  4. Check the usual suspects in your logs: tight loops in Lambda, missing CloudFront caching, polling integrations, and log delivery misconfigurations.
  5. Add S3 bucket tags and enable bucket-level cost allocation in your billing settings so Cost Explorer can show per-bucket breakdowns going forward.

The goal isn't just to fix this month's bill. It's to build enough observability that you see the next spike coming and can act before it compounds.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.