Fixing Terraform State Lock Stuck in DynamoDB After a Failed Apply

June 30, 2026 9 min read 3 views

You run terraform apply, the pipeline dies halfway through, and now every Terraform command throws Error acquiring the state lock. The DynamoDB table still holds the lock item from the dead process, and nothing will move until you deal with it.

This is one of the most common operational headaches with Terraform on AWS backends. It looks scary, but it is fixable in under five minutes if you know exactly what to do.

What a Terraform State Lock Actually Does

When Terraform uses an S3 backend with DynamoDB locking, it writes a single item to your lock table at the start of any operation that modifies state. That item contains a Lock ID, a timestamp, the operation type, and the identity of the process that acquired it.

Every other Terraform process checks for that item before doing anything. If the item exists, the command refuses to proceed. This prevents two simultaneous applies from producing a corrupted state file β€” a situation far worse than a stuck lock.

When a process exits cleanly, it deletes the item. When it crashes, times out, or gets killed mid-run, the item stays. That leftover item is what you are fighting.

Why the Lock Gets Stuck

There are a handful of common causes:

  • A CI/CD pipeline runner was terminated mid-job (spot instance reclamation, job timeout, manual cancel).
  • The network connection between the Terraform process and AWS dropped during the apply.
  • An unhandled error or signal killed the process before it could clean up.
  • Two conflicting applies ran simultaneously and one grabbed the lock then died.

In all these cases, the state file itself may or may not be partially written. That distinction matters when you decide how to recover.

What You'll Learn

  • How to confirm the lock is genuinely stale and not held by an active process.
  • How to find the Lock ID from DynamoDB directly.
  • How to use terraform force-unlock correctly.
  • How to delete the DynamoDB item manually when force-unlock is not enough.
  • How to verify state integrity before running another apply.

Prerequisites

You will need:

  • AWS CLI configured with sufficient permissions (dynamodb:GetItem, dynamodb:DeleteItem, s3:GetObject).
  • Terraform CLI installed (any recent version).
  • The name of your DynamoDB lock table and your S3 state bucket.
  • Confidence that no legitimate Terraform process is currently running against this state.

That last point is the most important. Never force-unlock while another apply is genuinely in progress. You would remove the lock from underneath a live operation and risk state corruption.

Step 1: Confirm the Lock Is Truly Stuck

Before touching anything, verify that the process holding the lock is actually dead. Check your CI/CD system, your local terminal sessions, and any scheduled pipelines. If you canceled a GitHub Actions job, confirm it reached the Cancelled state β€” not just Cancelling.

Then try running a read-only command to see the full error message:

terraform plan

Terraform will print an error that includes the Lock ID, the timestamp the lock was acquired, and the operation that created it. Grab that output β€” you will need the Lock ID in the next step.

If the timestamp is from hours ago and matches a pipeline run you know is dead, you are safe to proceed. If the timestamp is from the last few minutes, wait and confirm before doing anything.

Step 2: Retrieve the Lock ID

Terraform prints the Lock ID in its error message, so most of the time you already have it. If the output scrolled away or you are working from a log, you can pull it directly from DynamoDB.

Your lock table uses a partition key named LockID. The value stored there is your state path (typically the S3 key of your terraform.tfstate file). Scan for the item like this:

aws dynamodb scan \
  --table-name your-lock-table-name \
  --region us-east-1

The response will include an Info attribute containing a JSON blob. Inside that blob is an ID field β€” that is your Lock ID. Pull it out:

aws dynamodb scan \
  --table-name your-lock-table-name \
  --region us-east-1 \
  --query "Items[*].Info.S" \
  --output text | python3 -c "import sys, json; data=json.load(sys.stdin.buffer if hasattr(sys.stdin,'buffer') else sys.stdin); print(data['ID'])" 2>/dev/null || \
  aws dynamodb scan \
    --table-name your-lock-table-name \
    --region us-east-1 \
    --query "Items[*].Info.S" \
    --output text

If that pipeline is awkward, just read the raw Info JSON string and find the ID field manually. It is a UUID in the format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.

Step 3: Run terraform force-unlock

With the Lock ID in hand, this is a single command:

terraform force-unlock <LOCK_ID>

Terraform will ask for confirmation. Type yes. It will delete the DynamoDB item and report success.

If you are running this in a directory where your backend is configured, Terraform knows which table to target automatically. If you get a configuration error, make sure you have run terraform init in this directory and that your backend.tf (or equivalent) is pointing at the correct table and region.

# Full example with confirmation
terraform force-unlock a1b2c3d4-e5f6-7890-abcd-ef1234567890
# Terraform will prompt:
# Do you really want to force-unlock?
# Type 'yes' to confirm: yes

After this succeeds, run terraform plan again. If the lock clears and plan runs normally, you are done β€” skip to the verification section.

Step 4: Delete the DynamoDB Item Manually (Last Resort)

Occasionally force-unlock itself fails. This can happen when your local Terraform configuration does not exactly match the backend that holds the lock β€” for example, if someone renamed the workspace or bucket between the failed apply and now.

In that case, delete the lock item directly via the AWS CLI. First, identify the exact partition key value (the LockID attribute). From your earlier scan output, it will look something like your-bucket/path/to/terraform.tfstate.

aws dynamodb delete-item \
  --table-name your-lock-table-name \
  --region us-east-1 \
  --key '{"LockID": {"S": "your-bucket/path/to/terraform.tfstate"}}'

This command returns nothing on success. Verify the item is gone:

aws dynamodb scan \
  --table-name your-lock-table-name \
  --region us-east-1 \
  --query "Items"

An empty array means the table is clear. You now have no lock protection until you run a proper Terraform operation, so move quickly to verification.

Verifying State Integrity After Unlocking

A stuck lock from a failed apply raises a real question: did the state file get partially written before the crash? A half-written state is worse than a stale lock.

Pull the current state and inspect it:

terraform state list

If this command returns your expected resources without errors, the state file is intact. Run a plan to confirm no unexpected changes are queued:

terraform plan -out=recovery.tfplan

Read the plan output carefully. If you see resources being destroyed or recreated that should not be touched, your state may be ahead of or behind reality. In that case, compare the state against live AWS resources using terraform state show <resource_address> before applying anything.

If you suspect the state is corrupt, this is when you want to look at your S3 bucket's version history. S3 versioning (which you should always enable on state buckets) lets you restore an earlier known-good copy of terraform.tfstate. For more on diagnosing infrastructure issues after a failed deployment, see how CodeDeploy rollbacks can stall and leave fleets in a split state β€” the recovery mindset is similar.

Common Pitfalls When Unlocking State

Unlocking while a process is still alive

This is the big one. If you force-unlock while a legitimate apply is running in another terminal or pipeline, Terraform will not stop that apply β€” it just removes the lock from underneath it. That apply may then write a partial or conflicting state. Always confirm no active process before unlocking.

Wrong workspace

Terraform workspaces use different lock keys. If you manage multiple environments with workspaces (e.g., staging, production), make sure you are targeting the right one. Run terraform workspace show before force-unlocking to confirm.

Wrong region or table

If your backend region differs from your default AWS CLI region, the scan and delete commands will target the wrong table and appear to succeed while doing nothing. Always pass --region explicitly and double-check it matches your backend configuration.

IAM permissions gaps

The identity you use for Terraform (and for the manual CLI commands) needs dynamodb:DeleteItem explicitly. A role with read-only DynamoDB access will fail silently in some CLI versions or return a cryptic access-denied message. Confirm your permissions before troubleshooting the lock further.

Preventing Future Stuck Locks

Stuck locks are a symptom of processes dying uncleanly. You can reduce their frequency and make recovery faster with a few practices.

Enable S3 versioning on your state bucket. This is table stakes. Every state write creates a new version, so you can always roll back to a known-good state without losing history.

Set a DynamoDB TTL on lock items. Add a TTL attribute to your lock table and configure Terraform to write an expiry timestamp. DynamoDB will automatically purge items older than your threshold (say, four hours). This does not protect you from concurrent applies, but it does mean a crash from Friday night does not block Monday morning.

Use short-lived CI runners. Spot instance reclamation is a common cause of crashed applies. For infrastructure pipelines, prefer on-demand runners or configure your spot interruption handler to run terraform force-unlock as a shutdown hook before terminating.

Scope your lock tables by environment. Using one DynamoDB table per environment (not one shared table) limits the blast radius. A stuck lock in staging does not affect production, and you can diagnose the right table immediately without scanning through noise. This is especially relevant if you are managing per-environment monitoring stacks that need to stay operational during infrastructure changes.

Alert on long-held locks. Set a CloudWatch alarm on a custom metric or a DynamoDB stream that fires if a lock item is older than a threshold you define. Catching a stuck lock automatically beats discovering it when a developer tries to run a plan. For patterns on building effective CloudWatch alerts, see the guide on fixing CloudWatch alarms stuck in INSUFFICIENT_DATA after deployment.

Restrict who can write state directly. Tighten IAM so that only your CI/CD role can call s3:PutObject on the state bucket and dynamodb:PutItem on the lock table. Manual applies from developer laptops are a common source of orphaned locks β€” if a developer's laptop sleeps mid-apply, the lock is stuck until they wake up and notice. Forcing all applies through a controlled pipeline removes that failure mode.

Wrapping Up

Stuck Terraform locks feel urgent but are almost always resolved in a few minutes once you know the path. Here are the concrete steps to take right now:

  1. Confirm the holding process is dead before touching anything.
  2. Get the Lock ID from the Terraform error output or by scanning the DynamoDB table directly.
  3. Run terraform force-unlock <LOCK_ID> from your initialized working directory.
  4. Fall back to aws dynamodb delete-item if force-unlock cannot find the backend configuration.
  5. Verify state integrity with terraform state list and terraform plan before running another apply.

Once the immediate issue is resolved, spend thirty minutes enabling S3 versioning (if it is not already on) and setting up a DynamoDB TTL. Those two changes eliminate the most painful edge cases and give you a recovery path even when something goes wrong at 2 a.m.

Frequently Asked Questions

Is it safe to run terraform force-unlock without knowing if another apply is active?

No, it is not safe unless you have confirmed no other Terraform process is currently running against that state. Force-unlocking an active apply removes the lock from underneath the running operation, which can lead to state corruption or duplicate resource creation.

How do I find the Lock ID for a stuck Terraform state lock in DynamoDB?

Terraform prints the Lock ID in the error message it shows when a plan or apply is blocked. If that output is unavailable, you can run an AWS CLI scan against your DynamoDB lock table and read the ID field from the Info attribute of the lock item.

Can a stuck Terraform state lock corrupt my state file?

The lock itself does not corrupt the state file β€” it just prevents further operations. The risk of corruption comes from the failed apply that left the lock behind. Always run terraform state list and terraform plan after unlocking to confirm the state file reflects reality before applying again.

How do I prevent Terraform state locks from getting stuck in DynamoDB permanently?

Enable a TTL attribute on your DynamoDB lock table so that old lock items are automatically expired after a defined period, such as four to eight hours. Combining this with S3 versioning on your state bucket gives you automatic cleanup and a rollback path if something does go wrong.

What permissions do I need to delete a Terraform state lock from DynamoDB manually?

You need the dynamodb:DeleteItem permission on the lock table, plus dynamodb:Scan or dynamodb:GetItem to read the existing lock item. If your IAM role only has read access, the delete command will fail with an access-denied error.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.