Fixing Python boto3 S3 Uploads That Silently Overwrite Existing Files

June 11, 2026 6 min read 62 views

You run a nightly script that uploads processed reports to S3. Three weeks later a colleague asks why last month's numbers changed. You trace it back to a misconfigured key prefix and realize the script silently stomped on dozens of files it was never supposed to touch. No error, no warning β€” just gone.

This is the default behavior of put_object and upload_file in boto3. S3 treats every upload as authoritative. Your job is to add the checks AWS leaves out.

What you'll learn

  • How to check whether an object already exists before uploading
  • How to use S3 conditional writes (the If-None-Match header) to make overwrite protection atomic
  • How to enable bucket versioning so overwrites are recoverable rather than destructive
  • How to compare ETags to detect content changes before deciding to upload
  • Common mistakes that make each approach fail silently

Prerequisites

You need Python 3.8 or later, boto3 installed (pip install boto3), and AWS credentials configured either via environment variables, an ~/.aws/credentials file, or an IAM role. The conditional write feature requires that your bucket is not using a multi-Region access point for these calls.

Why boto3 Overwrites Without Warning

S3 is an object store, not a filesystem. There is no concept of a lock or an open file handle. When you call put_object, S3 receives a key and a body, stores the body, and returns a 200. If an object with that key already exists, it is replaced. That's the contract.

boto3 mirrors that contract faithfully. There is no overwrite=False parameter because, at the HTTP level, there wasn't one β€” at least not until AWS added support for conditional writes using standard HTTP headers. Older tutorials that skip this step are not wrong about the API; they just assume you want overwrite behavior.

Approach 1: Check Before You Upload

The simplest guard is a head_object call before each upload. If the object exists, head_object succeeds. If it doesn't, boto3 raises a ClientError with a 404 status.

import boto3
from botocore.exceptions import ClientError

s3 = boto3.client("s3")

def object_exists(bucket: str, key: str) -> bool:
    try:
        s3.head_object(Bucket=bucket, Key=key)
        return True
    except ClientError as e:
        if e.response["Error"]["Code"] == "404":
            return False
        raise  # re-raise anything that isn't a 404

def safe_upload(bucket: str, key: str, local_path: str) -> None:
    if object_exists(bucket, key):
        raise FileExistsError(f"s3://{bucket}/{key} already exists. Aborting upload.")
    s3.upload_file(local_path, bucket, key)
    print(f"Uploaded {local_path} -> s3://{bucket}/{key}")

This works and it's easy to read. The catch is that it's not atomic. Between your head_object and your put_object, another process could write the same key. For low-concurrency scripts this is usually fine. For anything running in parallel, keep reading.

Approach 2: Conditional Writes With If-None-Match

AWS added support for the standard HTTP If-None-Match: * header on PutObject. When you include it, S3 will only store the object if no object with that key currently exists. If one does exist, S3 returns a 412 Precondition Failed. This check happens server-side in a single operation, so it's race-condition safe.

import boto3
from botocore.exceptions import ClientError

s3 = boto3.client("s3")

def conditional_upload(bucket: str, key: str, body: bytes) -> None:
    try:
        s3.put_object(
            Bucket=bucket,
            Key=key,
            Body=body,
            IfNoneMatch="*",  # fail if object already exists
        )
        print(f"Written to s3://{bucket}/{key}")
    except ClientError as e:
        error_code = e.response["Error"]["Code"]
        if error_code == "PreconditionFailed":
            print(f"Skipped: s3://{bucket}/{key} already exists.")
        else:
            raise

Note that IfNoneMatch is only available on put_object, not on the higher-level upload_file or upload_fileobj helpers. For large files you'll need to either read the file into memory or implement a multipart upload manually if you want this guarantee.

Important: If your bucket has S3 Object Lock or versioning enabled, the behavior of If-None-Match may differ. Test in a non-production bucket first.

Approach 3: Compare ETags to Allow Intentional Updates

Sometimes you want to upload only if the file has actually changed, rather than refusing all overwrites. S3's ETag is an MD5 hash of the object content for non-multipart uploads, so you can compare it to a local MD5 before deciding to upload.

import hashlib
import boto3
from botocore.exceptions import ClientError

s3 = boto3.client("s3")

def local_md5(path: str) -> str:
    h = hashlib.md5()
    with open(path, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            h.update(chunk)
    return h.hexdigest()

def upload_if_changed(bucket: str, key: str, local_path: str) -> None:
    local_hash = local_md5(local_path)
    try:
        head = s3.head_object(Bucket=bucket, Key=key)
        remote_etag = head["ETag"].strip('"')  # ETags are quoted strings
        if remote_etag == local_hash:
            print(f"No change detected for s3://{bucket}/{key}. Skipping.")
            return
    except ClientError as e:
        if e.response["Error"]["Code"] != "404":
            raise
        # Object doesn't exist yet β€” fall through to upload

    s3.upload_file(local_path, bucket, key)
    print(f"Uploaded s3://{bucket}/{key}")

Two caveats to keep in mind. First, for multipart uploads S3 computes the ETag differently (it includes a part count suffix like -5), so MD5 comparison will not match. Second, server-side encryption with a customer key can also affect the ETag format. If either of those applies to your bucket, use a custom metadata field to store your own hash instead of relying on ETag.

Approach 4: Enable Bucket Versioning

Versioning doesn't prevent overwrites, but it makes them recoverable. Every write creates a new version of the object. The previous version is retained and accessible by version ID. This is the easiest safety net to set up and it works regardless of which upload code you're using.

import boto3

s3 = boto3.client("s3")

def enable_versioning(bucket: str) -> None:
    s3.put_bucket_versioning(
        Bucket=bucket,
        VersioningConfiguration={"Status": "Enabled"},
    )
    print(f"Versioning enabled on {bucket}")

def list_versions(bucket: str, key: str) -> list:
    paginator = s3.get_paginator("list_object_versions")
    versions = []
    for page in paginator.paginate(Bucket=bucket, Prefix=key):
        versions.extend(page.get("Versions", []))
    return versions

Versioning costs money because you're storing every historical copy. Pair it with a lifecycle rule to expire old versions after a set number of days, or to keep only the last N versions, so storage costs don't quietly compound over time.

Approach 5: Use Bucket Policies to Block Overwrites at the IAM Level

If you want to enforce the no-overwrite rule across all principals β€” not just the one script you're writing now β€” you can use a bucket policy that denies s3:PutObject on existing objects. This is a coarser tool, but it adds an organization-level guarantee.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyOverwrite",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::your-bucket-name/*",
      "Condition": {
        "Null": {
          "s3:x-amz-copy-source": "true"
        },
        "StringEquals": {
          "s3:x-amz-metadata-directive": "REPLACE"
        }
      }
    }
  ]
}

Bucket policies are broad. Before applying one to a production bucket, trace every write path that touches it. A policy that blocks overwrites will also block legitimate update workflows if you haven't accounted for them.

Common Pitfalls

Catching the wrong error code

When you catch a ClientError, always check e.response["Error"]["Code"] rather than the message string. AWS error message wording can vary; the code is stable. A 404 from head_object usually comes back as the string "404", not the integer, so compare with == "404".

Assuming ETag is always MD5

It is, for single-part uploads with no server-side encryption. Once you introduce multipart uploads or SSE-C, the ETag format changes. Build your comparison logic to handle a mismatch gracefully rather than assuming an ETag mismatch always means the file changed.

Versioning doesn't protect deletes by default

Versioning creates a delete marker when you delete an object, but a hard delete (specifying the version ID) is permanent. Enable MFA delete on critical buckets if you need protection against accidental or malicious permanent deletions.

The race window in check-then-upload

If two processes run object_exists at the same millisecond, both get False and both proceed to upload. Only one will land last. Use conditional writes with IfNoneMatch if your workload has concurrent writers.

Wrapping Up

Pick the approach that matches your actual risk. Most one-off ETL scripts are fine with a head_object check. Concurrent upload pipelines need conditional writes. Anything storing critical data should also have versioning enabled as a backstop.

Concrete next steps:

  1. Audit your existing upload scripts and identify any bare put_object or upload_file calls without existence checks.
  2. Enable versioning on buckets that hold data you can't afford to lose, and add a lifecycle rule to cap stored versions.
  3. Replace head-then-put patterns in concurrent code with IfNoneMatch="*" conditional writes.
  4. If you're using SSE or multipart uploads, test your ETag comparison logic against a real object before trusting it in production.
  5. Review bucket policies on shared buckets to make sure no other team or service is writing to the same key namespace you're using.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.