Tracing Silent OOM Kills on DigitalOcean Droplets

Your app goes down at 2 AM. You check your application logs: nothing. No exception, no stack trace, no graceful shutdown message. The process just stopped. What you experienced was almost certainly the Linux OOM killer quietly executing your process, and DigitalOcean gave you zero notification about it.

This guide walks you through exactly how to confirm an OOM kill happened, how to find which process triggered it, and how to build guardrails so the next one doesn't catch you off guard.

What You'll Learn

How to read kernel logs to confirm an OOM kill event
Which tools to use to track memory consumption over time
How to set container and systemd memory limits that actually protect you
How to configure alerts before your Droplet runs out of headroom
Practical tuning options including swap, vm.swappiness, and OOM score adjustment

Prerequisites

This guide assumes you have SSH access to a DigitalOcean Droplet running Ubuntu 20.04 or 22.04 (most commands work on Debian-based distros generally). You should be comfortable running commands as root or with sudo. No special tooling is required beyond what ships with the OS, though a few optional packages are called out where relevant.

Why OOM Kills Are Silent

The Linux kernel's OOM killer is a last-resort mechanism. When the system cannot allocate memory for a new request, the kernel scores every running process and kills the one it considers least valuable. The whole event happens at the kernel level, below your application runtime, which is why your app logs show nothing. From your process's perspective, it received a SIGKILL with no warning and no chance to write a final log line.

DigitalOcean does not surface OOM events in the control panel. There is no email, no Slack notification, and no obvious dashboard indicator. Unless you know where to look in the kernel ring buffer, these kills are effectively invisible.

Step 1: Confirm the Kill Actually Happened

The most reliable place to look is journalctl, which exposes the kernel's log stream. Run this immediately after a suspicious restart:

sudo journalctl -k | grep -i "oom\|killed process\|out of memory"

You're looking for lines that look like this:

kernel: Out of memory: Killed process 14203 (gunicorn) score 892 or sacrifice child
kernel: Killed process 14203 (gunicorn) total-vm:1048576kB, anon-rss:924160kB, file-rss:0kB

The score value is the OOM badness score the kernel assigned. Higher scores get killed first. The anon-rss figure tells you how much RAM that process was actually consuming at the moment of the kill.

If journalctl doesn't go back far enough, check /var/log/kern.log or /var/log/syslog depending on your distro configuration:

sudo grep -i "oom\|killed process" /var/log/kern.log | tail -50

You can also use dmesg for a quick live check, though it only shows the current boot cycle:

sudo dmesg | grep -i "oom\|killed"

Step 2: Understand the Timeline

A single kill event in isolation tells you what died but not why memory ran out. You need to understand the build-up. sar from the sysstat package is invaluable here because it records system stats at regular intervals.

sudo apt install sysstat -y
sudo systemctl enable sysstat --now

Once enabled, sysstat samples memory, CPU, and I/O every ten minutes by default. After your next incident, you can replay what happened in the hour before the kill:

# Show memory usage for today, last 2 hours
sar -r -s $(date +%H:%M -d '2 hours ago') | head -40

The columns you care about are kbmemfree (free physical RAM) and kbswpfree (free swap). If you see both trending toward zero before the timestamp of the kill, you have a classic slow memory leak on your hands rather than a sudden spike.

Step 3: Identify the Leaking Process

Historical data shows you the trend, but you need to catch the culprit live. smem gives a cleaner picture of real memory consumption than top or ps because it accounts for shared memory correctly.

sudo apt install smem -y
# Show top 10 processes by RSS
smem -t -k -r -s rss | head -12

For long-running leak detection, a simple loop writing snapshots to a file works well enough in most cases:

while true; do
  date >> /tmp/mem_log.txt
  ps aux --sort=-%mem | head -10 >> /tmp/mem_log.txt
  sleep 60
done &

Run this in a tmux or screen session and let it collect data for a few hours. When you review the log, you're looking for a process whose RSS column grows monotonically without ever releasing memory. A healthy process will fluctuate. A leaking one will only go up.

Step 4: Set Hard Memory Limits

Once you know which process is the problem, you can contain it. The right approach depends on how your app runs.

Systemd service limits

If your app runs as a systemd service, add a MemoryMax directive to your unit file. This is enforced by cgroups and is hard to bypass:

sudo systemctl edit myapp.service

In the override file that opens, add:

[Service]
MemoryMax=512M
MemorySwapMax=0

Setting MemorySwapMax=0 prevents the service from consuming swap, which means it will be killed cleanly by systemd before it can destabilize the whole host. Reload and restart:

sudo systemctl daemon-reload
sudo systemctl restart myapp.service

Docker container limits

If you're running containers, set the memory limit at runtime or in your Compose file. Without this, a container can consume all available host memory:

services:
  web:
    image: myapp:latest
    mem_limit: 512m
    memswap_limit: 512m

Setting memswap_limit equal to mem_limit disables swap for the container. The container OOM killer will handle an over-limit process before the host kernel needs to intervene.

Step 5: Tune Swap and vm.swappiness

Many DigitalOcean Droplets ship with no swap configured, which makes OOM kills more likely because the kernel has no relief valve when RAM fills up. Adding a swap file buys you time and reduces the frequency of hard kills, though it is not a substitute for fixing a real memory leak.

# Create a 2 GB swap file
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# Make it permanent
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

vm.swappiness controls how aggressively the kernel moves anonymous memory to swap. The default value of 60 is tuned for desktop workloads. For a server running a single primary app, setting it lower reduces unnecessary swapping while still giving you the safety net:

sudo sysctl vm.swappiness=10
# Persist across reboots
echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.d/99-swappiness.conf

Step 6: Adjust OOM Scores to Protect Critical Processes

The kernel assigns each process an OOM score between -1000 and 1000. You can nudge this score so that a less important process (say, a background worker) gets killed before your primary web server. The adjustment lives in /proc/<pid>/oom_score_adj.

To make a process very unlikely to be killed, set a negative adjustment:

# Replace 12345 with the actual PID
echo -500 | sudo tee /proc/12345/oom_score_adj

For a systemd service, use the OOMScoreAdjust directive in the unit override instead of setting it per-PID manually:

[Service]
OOMScoreAdjust=-500

Conversely, to make a background job a preferred kill target if memory pressure hits, set a positive value like +800. Think of this as telling the kernel your priority order before an emergency happens.

Step 7: Set Up Proactive Alerting

The best OOM kill is the one that never happens because you noticed memory trending up and acted first. DigitalOcean's built-in monitoring can trigger alerts when memory utilization exceeds a threshold — enable it from the Droplet's Graphs tab and configure an alert policy at around 80–85% used RAM. That gives you a window to act before the kernel has to.

For more granular control, a lightweight agent like netdata (open source, free tier available) can alert on specific metrics including available RAM dropping below an absolute value rather than just a percentage:

bash <(curl -Ss https://my-netdata.io/kickstart.sh)

Once installed, Netdata streams live metrics at one-second resolution and has a built-in alerting engine. You can configure it to send notifications via email, Slack, or PagerDuty when free RAM drops below a hard floor.

Common Pitfalls

Assuming the app log tells the full story. It never does for OOM kills. Always check the kernel log first.
Adding swap as a permanent fix. Swap buys time; it does not fix the leak. A process that swapped out hard before being killed just means your app was degraded for longer before dying.
Setting memory limits too tight. If you cap a service at 256 MB and it legitimately needs 400 MB at peak load, you'll just manufacture OOM kills instead of preventing them. Profile your app's actual peak usage first.
Forgetting to persist sysctl changes. Values set with sysctl directly reset on reboot. Always write them to a file under /etc/sysctl.d/.
Running without sysstat. Install it on every server at provisioning time. It has near-zero overhead and is invaluable after the fact.

Wrapping Up

OOM kills are silent by design, but they leave clear evidence if you know where to look. Here are the concrete next steps to take right now:

SSH into your Droplet and run sudo journalctl -k | grep -i oom to check whether a kill has already happened.
Install sysstat if it isn't running, so you have historical memory data for the next incident.
Set MemoryMax in your systemd unit file (or mem_limit in your Compose file) based on your app's actual measured peak usage plus 20% headroom.
Add a swap file if your Droplet doesn't have one, and set vm.swappiness=10.
Configure a memory alert in DigitalOcean's monitoring panel at 80% utilization so you can investigate before the kernel is forced to act.

Tracing Silent DigitalOcean Droplet OOM Kills Before They Down Your App

What You'll Learn

Prerequisites

Why OOM Kills Are Silent

Step 1: Confirm the Kill Actually Happened

Step 2: Understand the Timeline

Step 3: Identify the Leaking Process

Step 4: Set Hard Memory Limits

Systemd service limits

Docker container limits

Step 5: Tune Swap and vm.swappiness

Step 6: Adjust OOM Scores to Protect Critical Processes

Step 7: Set Up Proactive Alerting

Common Pitfalls

Wrapping Up

Related Articles

Debugging ALB 502 Errors That Vanish Before Your Logs Capture Them

Fixing ECS Task Failures That Only Appear Under Production Load

Diagnosing Runaway AWS Costs from S3 Request Charges Nobody Warned You About

Comments (0)

Leave a Comment

Tracing Silent DigitalOcean Droplet OOM Kills Before They Down Your App

What You'll Learn

Prerequisites

Why OOM Kills Are Silent

Step 1: Confirm the Kill Actually Happened

Step 2: Understand the Timeline

Step 3: Identify the Leaking Process

Step 4: Set Hard Memory Limits

Systemd service limits

Docker container limits

Step 5: Tune Swap and vm.swappiness

Step 6: Adjust OOM Scores to Protect Critical Processes

Step 7: Set Up Proactive Alerting

Common Pitfalls

Wrapping Up

Related Articles

Debugging ALB 502 Errors That Vanish Before Your Logs Capture Them

Fixing ECS Task Failures That Only Appear Under Production Load

Diagnosing Runaway AWS Costs from S3 Request Charges Nobody Warned You About

Comments (0)

Leave a Comment

Stay ahead of the curve