Fixing Silent Data Loss When Merging Upstream Changes in Forked Repos
You sync your fork with upstream, the merge completes without a single conflict, and everything looks fine. Then three days later a colleague notices that a feature you shipped two weeks ago has simply vanished from the branch. No error, no warning β just gone.
This is one of the more disorienting problems in collaborative Git workflows because the tooling gives you no signal that anything went wrong. Understanding why it happens and building a repeatable process to catch it is the difference between a trustworthy contribution workflow and a repo that quietly eats work.
What you'll learn
- Why Git can drop your commits silently during an upstream merge
- How to detect lost commits before they reach a shared branch
- Which merge strategies are safe and which are dangerous for forks
- A repeatable sync workflow that protects your history
- Common pitfalls maintainers and contributors both miss
Prerequisites
This article assumes you are comfortable with basic Git commands (commit, merge, rebase, log) and that you have a forked repository with a configured upstream remote. Examples use Git 2.x on the command line. The concepts apply to any hosting platform (GitHub, GitLab, Gitea).
Why Silent Loss Happens
When you merge upstream into your fork's main branch, Git performs a three-way merge. It finds the common ancestor of your branch and the upstream branch, then applies changes from both sides. The problem is what Git considers a "change."
If upstream rewrote a file in a way that encompasses or reverts what you changed, Git can resolve the merge cleanly by accepting upstream's version β and your commit's effect disappears even though the commit itself still exists in the reflog. The merge commit is legitimate. The content you added is not there anymore.
This scenario is especially common when:
- Upstream did a large refactor that touched the same files you modified
- Upstream squash-merged a branch that effectively reverted interim history
- You rebased your fork's branch onto a different base commit before syncing
- Upstream force-pushed to rewrite history (rare, but it happens on younger projects)
Confirming the Problem: How to Detect Lost Commits
Before you can fix anything, you need to know which commits are missing. Git gives you two reliable tools for this.
Using git log with symmetric difference
The symmetric difference operator (...) lets you see commits that exist on one branch but not another.
# Commits on your branch not reachable from upstream/main
git log upstream/main..HEAD --oneline
# Commits on upstream/main not reachable from your branch
git log HEAD..upstream/main --onelineRun the first command before the merge so you have a baseline list of your commits. Run it again after. If commits disappeared from the output without appearing in the diff of your branch, they were silently absorbed.
Using git cherry
git cherry compares commits by their patch content rather than their SHA. A commit prefixed with - means an equivalent patch already exists in the upstream; one prefixed with + means it does not.
git cherry -v upstream/main HEADSave this output to a file before you sync. After the merge, run it again and diff the two outputs. Any commit that moved from + to absent without you explicitly intending it should be investigated immediately.
# Save pre-merge cherry output
git cherry -v upstream/main HEAD > before-sync.txt
# After syncing...
git cherry -v upstream/main HEAD > after-sync.txt
diff before-sync.txt after-sync.txtSafe Upstream Sync Strategies
The way you pull upstream changes determines how much risk you carry. Not all approaches are equal.
Rebase your work on top of upstream
Rebasing replays your commits on top of the latest upstream HEAD. Each commit is re-applied one at a time, so Git is forced to surface conflicts at the exact point where they occur rather than silently resolving them at a higher level.
# Fetch latest upstream
git fetch upstream
# Rebase your branch on upstream main
git rebase upstream/mainIf a commit conflicts, Git pauses and asks you to resolve it manually. Nothing is silently discarded. This is the preferred method for feature branches that have not yet been shared widely.
Merge with --no-ff and inspect the result
If you must merge (for example, because the branch history is already public), use --no-ff to force a merge commit even when a fast-forward is possible. Then immediately audit what happened.
git fetch upstream
git merge --no-ff upstream/main
# Inspect what the merge commit actually changed
git show HEADRead the diff of the merge commit carefully. A merge commit that touches your feature files is a warning sign β it means upstream's version won the conflict resolution, and you should verify your changes are still present.
Use a dedicated sync branch
Never merge upstream directly into a branch that contains unreviewed work. Create a temporary branch, sync there first, review it, then bring it into your working branch.
git fetch upstream
git checkout -b upstream-sync upstream/main
# Now you have a clean branch representing upstream
# Merge or rebase your feature branch on top of it
git checkout my-feature
git rebase upstream-sync
# Delete the sync branch when done
git branch -d upstream-syncRecovering Lost Commits
If you have already merged and discovered that commits are missing, the reflog is your recovery mechanism. Git keeps a local record of every position HEAD has been in, including before the merge.
# Show the reflog for HEAD
git reflog
# Find the SHA just before the merge commit
# It will look something like: abc1234 HEAD@{3}: merge upstream/main: Merge made by the...Once you identify the SHA of your branch before the merge, you can cherry-pick the missing commits back onto your current branch.
# Cherry-pick a range of commits from before the merge
git cherry-pick abc1234^..def5678Alternatively, you can create a recovery branch from the pre-merge state, verify your commits are there, then selectively apply them.
git checkout -b recovery abc1234
git log --oneline -10The reflog entries expire after 90 days by default, so do not wait too long to recover.
Automating the Safety Check
A pre-merge script that captures the commit list and compares it afterward removes the human memory requirement from this process. Here is a minimal shell script you can drop into your project's tooling.
#!/usr/bin/env bash
# sync-upstream.sh β safe upstream sync with commit audit
set -euo pipefail
UPSTREAM_REMOTE="${1:-upstream}"
UPSTREAM_BRANCH="${2:-main}"
SNAPSHOT_FILE=".git/pre-sync-cherry.txt"
echo "Capturing pre-sync commit list..."
git cherry -v "${UPSTREAM_REMOTE}/${UPSTREAM_BRANCH}" HEAD > "${SNAPSHOT_FILE}"
echo "Fetching upstream..."
git fetch "${UPSTREAM_REMOTE}"
echo "Rebasing onto ${UPSTREAM_REMOTE}/${UPSTREAM_BRANCH}..."
git rebase "${UPSTREAM_REMOTE}/${UPSTREAM_BRANCH}"
echo "Verifying commits after sync..."
git cherry -v "${UPSTREAM_REMOTE}/${UPSTREAM_BRANCH}" HEAD > /tmp/post-sync-cherry.txt
PRE_COUNT=$(wc -l < "${SNAPSHOT_FILE}")
POST_COUNT=$(wc -l < /tmp/post-sync-cherry.txt)
if [ "${POST_COUNT}" -lt "${PRE_COUNT}" ]; then
echo "WARNING: Commit count dropped from ${PRE_COUNT} to ${POST_COUNT}. Review the diff:"
diff "${SNAPSHOT_FILE}" /tmp/post-sync-cherry.txt
exit 1
fi
echo "Sync complete. Commit count unchanged (${POST_COUNT})."
Run it as bash sync-upstream.sh upstream main. If the commit count drops, the script exits non-zero and prints the diff so you can see exactly what changed.
Common Pitfalls
Trusting a clean merge as proof of correctness
A merge with zero conflicts is not a guarantee that your content survived intact. Upstream could have simply overwritten the same section in a way that happens to be a strict superset of your changes β or an outright replacement. Always verify the diff of the merge commit touches what you expect.
Squash-merging your own feature branch before syncing
If you squash your feature branch into a single commit and then rebase onto upstream, git cherry may not recognise the squashed commit as equivalent to the original commits. You can end up in a state where Git cannot tell what is yours and what is upstream's. Prefer to squash only after the sync is confirmed clean.
Forgetting to update the upstream remote URL
When an upstream project moves (organisation rename, domain change), your remote URL goes stale silently. You keep fetching from an old mirror and never see the real upstream commits. Run git remote -v periodically and compare against the canonical repository URL.
Relying on GitHub's "Sync fork" button without auditing
The platform's one-click sync is convenient, but it merges upstream into your default branch without any pre-merge snapshot. Use it only on branches that contain no original work, or always pull the result locally and run git cherry afterward.
Wrapping Up
Silent data loss in forked repos is almost always preventable with a small amount of process discipline. Here are the concrete actions to take right now:
- Run
git cherry -v upstream/main HEADbefore every upstream sync and save the output. Compare it after the sync completes. - Switch to rebase-based syncing (
git rebase upstream/main) for feature branches that are not yet public. It surfaces conflicts where they belong instead of hiding them in a merge commit. - Read the diff of every merge commit that touches files you own before pushing. A merge commit that modifies your files is always worth a second look.
- Add the sync script above (or an equivalent) to your project tooling so the audit runs automatically and your team does not rely on memory.
- Check your reflog immediately if you suspect a loss. The window to recover is 90 days, but the sooner you act the simpler the cherry-pick will be.
π€ Share this article
Sign in to saveComments (0)
No comments yet. Be the first!