Every pull request, every merge conflict, every git log -p output — they all rely on text diffing. Yet most developers treat diff output as a black box: green lines are additions, red lines are deletions, move on. Understanding how diff algorithms actually work makes you faster at code review, better at resolving merge conflicts, and more effective at structuring commits that are easy for your team to review.
What a Diff Actually Computes
A diff algorithm takes two sequences of text (usually lines) and computes the minimum edit script — the smallest set of insertions and deletions that transforms the first text into the second. This is formally known as the Longest Common Subsequence (LCS) problem, and it has been studied in computer science since the 1970s.
Consider two simple files:
# File A # File B
function greet() function greet()
puts "hello" puts "hello"
puts "world" puts "goodbye"
end end
The diff identifies that line 3 changed from puts "world" to puts "goodbye". In unified diff format, this appears as a deletion of the old line followed by an insertion of the new one. But the algorithm does not actually understand "change" — it only understands insert and delete. A "changed" line is simply a delete-then-insert at the same position. This distinction matters when you are reading complex diffs where multiple adjacent lines change simultaneously.
The Myers Diff Algorithm
The default algorithm behind git diff, GNU diff, and most programming libraries is Eugene Myers' 1986 algorithm. It finds the shortest edit script — the minimum number of insertions and deletions — using a greedy approach that explores an edit graph.
Think of the edit graph as a grid where the x-axis represents lines in the original file and the y-axis represents lines in the modified file. Moving right means deleting a line from the original. Moving down means inserting a line from the new version. Moving diagonally means a line is unchanged (a match). The algorithm finds the path from the top-left corner to the bottom-right corner that uses the most diagonal moves — maximising matching lines and minimising edits.
The time complexity is O(N * D) where N is the total length of both files and D is the size of the minimum edit script. For files that are mostly similar (small D), this is nearly linear. For completely different files, it degrades toward O(N²). In practice, code review diffs are usually small changes to large files, so Myers performs extremely well.
When Myers Gets It Wrong: The Patience Diff Alternative
Myers optimises for the shortest edit script, but the shortest diff is not always the most readable. Consider this refactoring where a function is moved:
# Before # After
def validate() def process()
check_input() run_pipeline()
end end
def process() def validate()
run_pipeline() check_input()
end end
Myers might match the end keywords across the wrong functions, producing a confusing diff that appears to change the bodies of both functions rather than simply showing that they swapped positions. The output would be technically correct (minimum edits) but cognitively expensive to review.
Patience diff (available in Git via git diff --patience) takes a different approach. It first identifies unique lines that appear exactly once in both files — these become anchor points. Then it builds the diff around those anchors. Because function signatures and unique comments tend to be unique lines, patience diff produces more readable output for structural changes, function reordering, and block movements.
Git also offers the --histogram algorithm, which is an optimised version of patience diff and is the default in some Git configurations. You can set it globally with git config --global diff.algorithm histogram.
Reading Unified Diff Format
The unified diff format is what you see in pull requests, git diff output, and patch files. Here is a complete example with annotations:
--- a/src/auth.js ← Original file path
+++ b/src/auth.js ← Modified file path
@@ -12,7 +12,9 @@ function login(user) { ← Hunk header
const token = generateToken(); ← Context line (unchanged)
const expiry = Date.now(); ← Context line
- return { token, expiry }; ← Deleted line
+ const refreshToken = uuid(); ← Added line
+ return { ← Added line
+ token, expiry, refreshToken ← Added line
+ }; ← Added line
} ← Context line
The hunk header @@ -12,7 +12,9 @@ tells you: the original starts at line 12 and shows 7 lines; the modified version starts at line 12 and shows 9 lines. The function name after @@ is a Git convenience that shows the nearest enclosing function — incredibly useful for navigating large diffs.
Context lines (no prefix) are unchanged lines shown for reference, typically 3 lines above and below each change. You can adjust this with git diff -U5 for 5 lines of context or -U0 for no context at all.
Diff Strategies for Better Code Reviews
Review Commit by Commit, Not the Full PR
A well-structured pull request tells a story through its commits. Reviewing the full diff shows you the destination but not the journey. Reviewing each commit separately shows the reasoning behind each change. In GitHub, click "Commits" tab on a PR to review this way. On the command line, git log --oneline feature-branch..main lists the commits, and git show <hash> shows each one.
Use Word-Level Diffing for Prose and Config
Line-level diffs are perfect for code but terrible for prose, configuration files with long lines, or minified content. Git supports word-level diffing with git diff --word-diff, which highlights individual changed words within a line rather than flagging the entire line as modified. For JSON and YAML config files, this makes small value changes immediately visible instead of forcing you to scan long lines character by character.
Ignore Whitespace When It Doesn't Matter
Reformatting commits — running Prettier, changing indentation, normalising line endings — produce enormous diffs that contain zero logical changes. Use git diff -w to ignore all whitespace changes, or git diff --ignore-space-change to ignore only changes in the amount of whitespace (preserving additions and deletions of whitespace). In GitHub PRs, the "Hide whitespace changes" toggle does the same thing.
Detect Moved Code with --color-moved
Git 2.15 introduced --color-moved, which highlights code that was moved from one location to another using a distinct colour (typically dimmed). This is invaluable for refactoring reviews where functions or blocks are reorganised without modification. Enable it by default: git config --global diff.colorMoved default.
Semantic Diff: Beyond Line Matching
Traditional diff tools treat source code as plain text. They have no understanding of syntax, so a moved function looks identical to a deleted-and-recreated function. Semantic diff tools parse the abstract syntax tree (AST) of the code and compare structural elements rather than text lines.
Tools like Difftastic understand the syntax of dozens of languages. They can show that a function was moved without modification, that a variable was renamed consistently, or that an if block was wrapped around existing code — all things that produce noisy line-level diffs but are structurally simple changes.
You can configure Git to use an external diff tool: git config --global diff.external difft (for Difftastic) or git difftool for one-off comparisons. The trade-off is speed — AST parsing is slower than line-level comparison, so semantic diff tools may feel sluggish on very large changesets.
Diffing in Different Contexts
Pull Request Reviews
GitHub, GitLab, and Bitbucket all render diffs with syntax highlighting, inline commenting, and file-level navigation. Tips for effective PR review: collapse files you have already reviewed (GitHub remembers this across sessions), use keyboard shortcuts (n/p to jump between files in GitHub), and leave summary comments on the PR rather than scattering feedback across individual lines when the issue is architectural.
Merge Conflict Resolution
Merge conflicts are really three-way diffs: the common ancestor (base), your version (ours), and the incoming version (theirs). Understanding this helps enormously. Use git config merge.conflictStyle zdiff3 to show all three versions in conflict markers instead of just two. The base version tells you what was there before either side made changes, which often makes the correct resolution obvious.
Database Migrations
Schema diffs are a specialised form of text diffing where the output is a migration script. Tools like sqldiff, Flyway, and Alembic compute the diff between two database schemas and generate ALTER TABLE statements. The same LCS principles apply, but the "edit operations" are SQL DDL statements rather than line insertions and deletions.
Configuration Drift Detection
Infrastructure-as-code tools (Terraform, Ansible, Puppet) use diffing to show planned changes before applying them. terraform plan is essentially a diff between your desired state (config files) and the current state (cloud resources). Reading these diffs accurately prevents catastrophic infrastructure changes.
Performance Characteristics of Diff Algorithms
| Algorithm | Time Complexity | Best For | Available In |
|---|---|---|---|
| Myers | O(N × D) | Shortest edit script, typical code changes | Git (default), GNU diff |
| Patience | O(N log N + D²) | Structural changes, function reordering | git diff --patience |
| Histogram | O(N × D) optimised | General purpose, often faster than Myers | git diff --histogram |
| Hunt-McIlroy | O(N × D) | Original Unix diff, historical interest | Legacy systems |
| Semantic/AST | O(N log N) typical | Language-aware structural comparison | Difftastic, GumTree |
Common Diff Mistakes in Code Review
Reviewing reformatted code as logic changes. When a commit mixes formatting changes with logic changes, the diff becomes nearly unreadable. Encourage your team to separate formatting commits from logic commits. Better yet, enforce formatting via pre-commit hooks so reformatting diffs never reach pull requests at all.
Missing context in large diffs. If a pull request touches 40 files, there is a strong temptation to skim. But the bugs hide in the files you skip. If you cannot review a PR thoroughly in 30 minutes, it is too large. Break it up. Research from SmartBear found that review effectiveness drops dramatically after 400 lines of diff, and anything over 200 lines should be reviewed in focused sessions.
Anchoring on the diff instead of the result. Diffs show you what changed, not what the code looks like now. After reviewing the diff, open the full file at the target commit and read the affected functions in context. A change that looks fine in isolation might be inconsistent with surrounding code.
Ignoring test diffs. Test code is production code — it documents expected behaviour and catches regressions. Review test diffs as carefully as implementation diffs. If the logic changed but no tests changed, that is a red flag worth commenting on.
Automating Diff Analysis
Several tools can augment human code review by analysing diffs programmatically. Linters that run on changed files only (via git diff --name-only) catch style issues before review. Coverage diff tools show whether new code is covered by tests. Complexity analysis on changed functions flags regressions in code maintainability.
For quick one-off comparisons during development — checking two versions of a config file, comparing API responses, or verifying text transformations — a browser-based diff tool is faster than setting up a full Git workflow. Our text diff tool runs entirely in your browser with syntax-highlighted output and no file uploads.
Compare Text Instantly
Paste two text blocks, see the differences highlighted. No uploads, no accounts — runs entirely in your browser.
Open Diff Tool →