Fixing Silent Data Loss in Django ManyToMany Through Models
You added a through model to your Django ManyToMany relationship, stored extra fields like joined_at or role, then called .set() or .add() on the relation β and your extra data disappeared. No exception, no warning, just gone. This is one of the quieter bugs in Django's ORM, and it catches experienced developers off guard regularly.
What you'll learn
- Why Django silently drops data when you use certain ManyToMany methods with a
throughmodel - Which ORM methods are safe to use and which to avoid
- How to write, update, and delete through-model records correctly
- How to add safeguards so future you (or your teammates) don't repeat the mistake
Prerequisites
This article assumes you're working with Django 3.2 or later and are comfortable with models, migrations, and basic ORM queries. You don't need anything beyond a standard Django project to follow along.
Why Through Models Exist
A standard ManyToMany field creates a hidden junction table with two foreign keys β nothing else. That works fine until you need to store something about the relationship itself: when a user joined a group, what permission level they have, or whether a product tag was applied automatically or manually.
The through parameter lets you point Django at an explicit model instead of the auto-generated one. You gain full control over that table, but you also take on responsibility for managing it yourself. Django pulls back from the wheel at that point, and that handoff is where the silent data loss lives.
A Concrete Example
Say you're building a course platform. A Student can enroll in many Course objects, and you want to record the enrollment date and the student's current grade.
from django.db import models
class Student(models.Model):
name = models.CharField(max_length=200)
class Course(models.Model):
title = models.CharField(max_length=200)
students = models.ManyToManyField(
Student,
through='Enrollment',
related_name='courses',
)
class Enrollment(models.Model):
student = models.ForeignKey(Student, on_delete=models.CASCADE)
course = models.ForeignKey(Course, on_delete=models.CASCADE)
enrolled_at = models.DateTimeField(auto_now_add=True)
grade = models.CharField(max_length=2, blank=True)
class Meta:
unique_together = ('student', 'course')
This looks right. Run the migrations and you have a clean Enrollment table. The trap is in how you interact with it next.
The Methods That Silently Destroy Your Data
Django's ManyToMany field ships with convenient shortcut methods: add(), set(), remove(), and clear(). When you have a plain (no through) ManyToMany, these are great. With a custom through model, most of them are either blocked or dangerous depending on the Django version.
The .set() problem
Here's what the problematic code looks like:
# DO NOT do this with a through model
course.students.set([student_a, student_b])
In older Django versions this would raise a AttributeError. In newer versions, Django permits .add() and .set() on through models only when all extra fields have defaults or are auto-set. If those conditions are technically satisfied, the call succeeds β but it creates enrollment rows with no meaningful grade data, and if you call .set() again to sync a list, Django will delete and recreate rows to match, wiping out any grade you stored.
The .add() partial problem
# Also risky
course.students.add(student_a)
This might work if all extra fields are optional or have defaults. But it gives you no way to specify the grade at enrollment time, and it invites future developers to assume this is the correct pattern. It isn't.
The Correct Way to Create Through Records
Treat the through model as a first-class model. Create, update, and delete its instances directly β don't go through the ManyToMany accessor at all.
Creating an enrollment
from myapp.models import Enrollment
Enrollment.objects.create(
student=student_a,
course=course,
grade='B+',
)
This is explicit, readable, and gives you full control over every field. There's no ambiguity about what ends up in the database.
Using get_or_create for idempotent operations
If you're syncing enrollments and don't want to duplicate records, get_or_create is your friend:
enrollment, created = Enrollment.objects.get_or_create(
student=student_a,
course=course,
defaults={'grade': 'B+'},
)
if not created:
# The enrollment already existed; update the grade if needed
enrollment.grade = 'A-'
enrollment.save(update_fields=['grade'])
The defaults parameter only applies on creation. If the record exists, you get it back and can decide what to update. This is the pattern you want for import scripts and sync operations.
Bulk operations
Need to enroll a hundred students at once? Use bulk_create on the through model:
enrollments = [
Enrollment(student=s, course=course, grade='')
for s in students_queryset
]
Enrollment.objects.bulk_create(
enrollments,
ignore_conflicts=True, # skip duplicates gracefully
)
Note that bulk_create does not call save() or fire post_save signals. If your logic depends on those signals, use a loop with individual .create() calls instead.
Querying Through the Relationship
Reading data is where the through model really earns its place. You can query through the ManyToMany accessor normally for basic lookups:
# All courses a student is enrolled in
student_a.courses.all()
# Students enrolled in a specific course
course.students.filter(enrollment__grade='A+')
But when you need the extra fields themselves, query the through model directly:
enrollments = Enrollment.objects.filter(
course=course,
).select_related('student').order_by('-enrolled_at')
for e in enrollments:
print(e.student.name, e.grade, e.enrolled_at)
Mixing both approaches in the same codebase is fine β just stay consistent about which one you use for which purpose.
Deleting and Updating Through Records
Removing a relationship means deleting the through model instance, not calling .remove() on the ManyToMany accessor (which may or may not be permitted, depending on Django version and field configuration).
# Correct way to unenroll a student
Enrollment.objects.filter(student=student_a, course=course).delete()
# Correct way to update a grade
Enrollment.objects.filter(
student=student_a,
course=course,
).update(grade='A')
Using .update() on the queryset avoids loading the object into memory and is efficient for bulk changes. Use .save() on an instance when you need pre_save/post_save signals to fire.
Common Pitfalls
Forgetting unique_together
Without a uniqueness constraint on the foreign key pair, you can accidentally create duplicate enrollment rows. Always add unique_together (or a UniqueConstraint in the Meta class) to your through model. Django won't enforce this for you automatically, and duplicate rows cause confusing query results later.
class Meta:
constraints = [
models.UniqueConstraint(
fields=['student', 'course'],
name='unique_student_course',
)
]
Mixing accessor methods with direct model management
If part of your codebase uses course.students.add() and another part uses Enrollment.objects.create(), you'll eventually get inconsistent behavior. Pick one approach and document it. A short comment at the top of the model file goes a long way:
class Course(models.Model):
# NOTE: Enrollment is a through model with extra fields.
# Always use Enrollment.objects.create/update/delete directly.
# Do not use course.students.add() or .set().
students = models.ManyToManyField(
Student,
through='Enrollment',
related_name='courses',
)
Calling .set() during fixture loading or test setup
Test factories and fixture scripts are the most common place this bug surfaces. A helper that builds course objects using .set() for students will silently create empty enrollment records. Audit your test factories and management commands for any ManyToMany accessor writes when a through model is involved.
Assuming signals fire on bulk operations
bulk_create and bulk_update skip Django signals. If an Enrollment post-save signal sends a welcome email or updates a cache, those actions won't run. Either loop with individual saves, or trigger the side effects manually after the bulk operation.
Adding a Safety Net with a Custom Manager
You can reduce the surface area for mistakes by overriding the through model's manager to make the safe pattern the default one. This won't prevent misuse of the ManyToMany accessor, but it centralizes enrollment logic:
class EnrollmentManager(models.Manager):
def enroll(self, student, course, grade=''):
enrollment, created = self.get_or_create(
student=student,
course=course,
defaults={'grade': grade},
)
return enrollment, created
def unenroll(self, student, course):
return self.filter(student=student, course=course).delete()
class Enrollment(models.Model):
student = models.ForeignKey(Student, on_delete=models.CASCADE)
course = models.ForeignKey(Course, on_delete=models.CASCADE)
enrolled_at = models.DateTimeField(auto_now_add=True)
grade = models.CharField(max_length=2, blank=True)
objects = EnrollmentManager()
class Meta:
constraints = [
models.UniqueConstraint(
fields=['student', 'course'],
name='unique_student_course',
)
]
Now callers write Enrollment.objects.enroll(student_a, course), and the safe pattern is one import away. New team members don't need to know about the pitfall to do the right thing.
Wrapping Up
The core rule is simple: when you define a through model, treat it as a full model and stop using the ManyToMany accessor for writes. Here are the concrete steps to lock this in:
- Audit every place in your codebase that calls
.add(),.set(),.remove(), or.clear()on a ManyToMany field that has athroughmodel. Replace them with directEnrollment.objectscalls. - Add a
UniqueConstraintto your through model if you don't already have one. Runmakemigrationsandmigrate. - Add a comment to the
ManyToManyFielddefinition explaining the correct write pattern for anyone who maintains the code later. - Check your test factories and fixtures for hidden
.set()or.add()calls and replace them. - Consider wrapping the common operations in a custom manager so the safe path is also the easy path.
π€ Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!