When Python Multiprocessing Silently Kills Your Exceptions
You kick off a pool of workers, wait for results, and get back... nothing. No traceback, no error message, just silence or a partial result set that looks almost right. Python's multiprocessing module has a talent for burying exceptions in child processes where you'll never see them unless you know exactly where to look.
This isn't a bug in the library. It's a consequence of how processes are isolated from each other. But once you understand the failure modes, you can build worker code that propagates errors reliably.
What You'll Learn
- Why exceptions in worker processes don't automatically surface in the parent
- The difference in error behavior between
Pool.map,Pool.apply_async, andProcess - How to retrieve exceptions from async results without losing the traceback
- Patterns for wrapping workers so errors always come back to you
- Common pitfalls that make debugging multiprocessing code harder than it needs to be
Prerequisites
You'll need Python 3.8 or later and a basic familiarity with the multiprocessing module. The examples run on Linux, macOS, and Windows, though a couple of notes call out Windows-specific behavior where it matters.
Why Processes Don't Share Exceptions
In a single-process Python program, an unhandled exception bubbles up the call stack until something catches it or the interpreter prints a traceback and exits. Processes don't share a call stack. Each child process has its own memory space, its own interpreter state, and its own exception context.
When a worker raises an exception, that exception lives and dies inside the child process unless the multiprocessing infrastructure explicitly serializes it (via pickle) and sends it back over an inter-process queue or pipe. Whether that happens depends entirely on which API you're using.
The Behavior of Pool.map
Pool.map is the most forgiving of the pool APIs when it comes to exceptions, which is why it's a good starting point.
from multiprocessing import Pool
def risky_worker(x):
if x == 3:
raise ValueError(f"Cannot process value {x}")
return x * 2
if __name__ == "__main__":
with Pool(4) as pool:
results = pool.map(risky_worker, range(6))
print(results)
Run this and you'll see the ValueError re-raised in the parent process, complete with a traceback. That's because Pool.map blocks until all tasks complete, and the pool machinery pickles the exception and re-raises it on the parent side when you access the result.
The catch: the exception terminates the entire map call. You don't get partial results for the tasks that succeeded. If you need to know which inputs worked and which didn't, Pool.map alone isn't enough.
Where Pool.apply_async Goes Quiet
This is where most developers get burned. apply_async returns an AsyncResult object immediately. The exception only surfaces when you call .get() on that object. If you never call .get(), the exception disappears.
from multiprocessing import Pool
import time
def risky_worker(x):
if x == 3:
raise ValueError(f"Cannot process value {x}")
return x * 2
if __name__ == "__main__":
with Pool(4) as pool:
results = [pool.apply_async(risky_worker, (i,)) for i in range(6)]
# Worker 3 already failed β but no one knows yet
time.sleep(1)
print("Still running fine... or are we?")
# The exception only surfaces here:
for r in results:
print(r.get()) # ValueError raised on the third iteration
If you wrap pool.apply_async in a fire-and-forget pattern and never retrieve results, your program will exit cleanly while workers have been failing the entire time. Logs will look normal. Downstream data will be incomplete.
Retrieving Exceptions Without Losing the Traceback
The standard fix is to call .get() with a timeout and wrap it in a try/except. But you also want to preserve the original traceback so you know where in the worker the failure happened.
from multiprocessing import Pool
import traceback
def risky_worker(x):
if x == 3:
raise ValueError(f"Cannot process value {x}")
return x * 2
if __name__ == "__main__":
with Pool(4) as pool:
futures = [(i, pool.apply_async(risky_worker, (i,))) for i in range(6)]
for input_val, future in futures:
try:
result = future.get(timeout=10)
print(f"{input_val} -> {result}")
except Exception as e:
print(f"Worker failed for input {input_val}: {e}")
# The traceback from the child is embedded in the exception
traceback.print_exc()
Python's multiprocessing layer re-raises the original exception type with the original message. The traceback you see points into the worker function, which is usually exactly what you need.
Wrapping Workers to Return Structured Results
For production pipelines, catching errors at the .get() call site is often too late or too scattered. A cleaner pattern is to make the worker itself never raise β instead, it returns a result object that carries either a value or an error.
from multiprocessing import Pool
from dataclasses import dataclass, field
from typing import Any, Optional
import traceback
@dataclass
class WorkerResult:
input_val: Any
output: Optional[Any] = None
error: Optional[str] = None
@property
def ok(self):
return self.error is None
def safe_worker(x):
try:
if x == 3:
raise ValueError(f"Cannot process value {x}")
return WorkerResult(input_val=x, output=x * 2)
except Exception:
return WorkerResult(input_val=x, error=traceback.format_exc())
if __name__ == "__main__":
with Pool(4) as pool:
results = pool.map(safe_worker, range(6))
for r in results:
if r.ok:
print(f"{r.input_val} -> {r.output}")
else:
print(f"FAILED for input {r.input_val}:\n{r.error}")
This approach gives you a complete result set every time. Successes and failures are both accounted for, and the full traceback string is preserved as data you can log, store, or alert on.
The Raw Process API and Silent Death
If you use multiprocessing.Process directly instead of a pool, the situation is worse by default. A child process that crashes simply exits with a non-zero exit code. Nothing is printed to the parent's stderr unless you explicitly arrange for it.
from multiprocessing import Process
def crasher():
raise RuntimeError("I crashed")
if __name__ == "__main__":
p = Process(target=crasher)
p.start()
p.join()
print(f"Exit code: {p.exitcode}") # -1 or non-zero, but no traceback
You'll see the exit code is non-zero, but you won't see the traceback in the parent unless you redirect stderr or use a Pipe or Queue to send the formatted exception back manually.
Sending Exceptions Back Through a Queue
The standard pattern for raw Process objects is to pass a Queue into the worker and have it send exceptions back before exiting.
from multiprocessing import Process, Queue
import traceback
def worker_with_queue(q, x):
try:
if x == 3:
raise RuntimeError("Something went wrong")
q.put((x, x * 2, None))
except Exception:
q.put((x, None, traceback.format_exc()))
if __name__ == "__main__":
q = Queue()
processes = [Process(target=worker_with_queue, args=(q, i)) for i in range(5)]
for p in processes:
p.start()
for p in processes:
p.join()
while not q.empty():
input_val, result, error = q.get()
if error:
print(f"FAILED for {input_val}:\n{error}")
else:
print(f"{input_val} -> {result}")
This is more verbose than the pool API, but it gives you complete control over what gets communicated back to the parent.
Common Pitfalls
Exceptions that can't be pickled
The multiprocessing pool re-raises exceptions by pickling them in the child and unpickling in the parent. Some custom exception classes β particularly those with non-serializable attributes β will fail to pickle, and you'll get a confusing PicklingError instead of the original exception. Keep custom exceptions simple: store only primitive types as attributes.
Forgetting the if __name__ == "__main__" guard
On Windows and in some macOS configurations, the multiprocessing module spawns new processes by importing the main module. Without the guard, every spawned process tries to create another pool, causing an infinite fork loop or a cryptic crash. Always protect your pool creation code.
Daemon processes eating exceptions silently
A process marked as daemon=True is killed abruptly when the parent exits. If your main program ends before the daemon workers finish, those workers are terminated mid-execution β no exception, no cleanup, no result. Use daemon processes only for tasks where incomplete execution is acceptable.
Timeouts hiding real failures
Calling future.get(timeout=5) raises multiprocessing.TimeoutError if the worker takes too long. This is easy to confuse with a worker crash. Log the distinction clearly β a timeout is a liveness failure, a raised exception is a correctness failure.
Wrapping Up
Silent failures in multiprocessing code are a visibility problem as much as a technical one. Once you know where exceptions get dropped, the fixes are straightforward.
- Audit your
apply_asynccalls β confirm that everyAsyncResultobject has a corresponding.get()call, and that it's inside a try/except block. - Adopt the structured result pattern for any pipeline where partial failures are possible. Return a result object that can carry either a value or an error string.
- Test failure paths explicitly β write a unit test where a worker raises, and assert that the exception surfaces in the parent correctly.
- Add exit code checks when using raw
Processobjects. A non-zerop.exitcodeis a signal that something went wrong, even if no traceback appeared. - Keep custom exceptions picklable β if you define exception classes in your project, verify they survive a
pickle.dumps/pickle.loadsround-trip.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!