Overview
What this concept solves
Union-Find — also called Disjoint Set Union, or DSU — is the data structure that answers two questions: 'are these two things in the same group?' and 'merge these two groups.' Both in essentially constant time.
The structure is a forest. Each element is a node pointing to a parent; following parents up to the root identifies an element's set. Two operations: find(x) walks up to the root; union(x, y) finds both roots and points one at the other. Naïve, these are O(n). With two cheap optimisations — union by rank (smaller tree hangs under bigger one) and path compression (every node visited during find is rewired directly to the root) — the amortised cost per operation drops to α(n), the inverse Ackermann function, which is ≤ 4 for any input you can fit in physical memory.
DSU isn't a graph algorithm on its own. It's a primitive that other graph algorithms lean on. Kruskal's MST uses it to test 'does adding this edge form a cycle?' in α(n) per edge. Connected-components labelling, dynamic connectivity under edge insertions, percolation simulations, and image segmentation all sit on top of DSU.
Mechanics
How it works
The structure
- Two arrays:
parent[i](initiallyi— every node is its own root) andrank[i](initially 0 — every tree is one node tall). - `find(x)`: follow
parent[x]up untilparent[x] == x. That's the root. The root identifies the set. - `union(x, y)`: find both roots. If they're the same, x and y are already in the same set. Otherwise, attach the smaller-rank root under the bigger-rank root.
- Are x and y in the same set? Check
find(x) == find(y).
The two optimisations
- Union by rank — when merging two trees, the shorter tree hangs under the taller one. This keeps trees from getting too tall: max rank after n unions is O(log n).
- Path compression — during
find(x), after walking up to the root, rewire every node on the path to point directly at the root. Nextfind()on any of those nodes is one step instead of log n. - Together: the amortised cost per operation is O(α(n)) — the inverse Ackermann function — which is ≤ 4 for every input you'll ever process. Effectively constant.
- Union by size is an alternative to union by rank — attach the smaller tree (by count of nodes) under the bigger. Same asymptotic guarantee.
Why α(n) ≈ constant
The inverse Ackermann function α(n) grows so slowly that α(2^65536) ≤ 4. There is no practical input on which it exceeds 4. So although DSU isn't strictly O(1) per operation, it's the closest a non-trivial data structure gets — and the proof (by Tarjan, 1975) is one of the most celebrated results in data-structure analysis.
DSU doesn't support split
There's no efficient way to break a set back apart. DSU is for problems where you only ever merge — Kruskal, connected components under edge additions, percolation. If you need to support edge deletion, you need link-cut trees (O(log n) per operation) or the more elaborate Euler-tour tree.
Interactive prototype
Run it. Break it. Tune it.
Sandboxed simulation embedded right in the page. No setup, no install.
About this simulation
Eight nodes, each its own set at the start. Use Union to merge two sets via union-by-rank; Find highlights the path from a node up to its root and then flattens it with path compression. Play runs a scripted sequence of merges so you can watch the forest collapse.
Hands-on
Try these on your own
Open the prototype above, run each experiment, predict the answer, then verify.
Union-by-rank, step by step
Click Union on the default sequence. Watch the rank labels: small-rank trees hang under big-rank trees. The tree depth grows logarithmically — even after merging all 8 nodes, the longest path to the root is only ~3.
Watch path compression flatten the tree
After a few unions, click Find on a leaf. The path from that leaf to the root gets highlighted, then every node on the path gets rewired directly to the root. Future find() calls on any of them are one step.
Try the all-merge scenario
Press Play to run a scripted 7-union sequence that ends with all 8 nodes in one set. Notice the components counter dropping from 8 → 1, and the Operations counter ticking up. Total work for 7 unions: ~10 array writes.
Verify two nodes are connected
After the merges, click Find on two nodes that should be in the same set. The roots match: connected. Now use the Reset button and try find() before any unions — every node is its own root, and the answer is not connected. That's the cycle-check Kruskal relies on.
In practice
When to use it — and what you give up
When it's the right tool
- Kruskal's MST — the cycle-detection step. Adding an edge forms a cycle iff its endpoints are already in the same component.
find(u) == find(v)decides it in α(n). - Dynamic connected components — answer 'are u and v in the same component?' as edges are added over time.
- Percolation simulations — random-edge addition until the top and bottom of a grid are in the same component (classic physics simulation, popularised by Princeton's Algorithms MOOC).
- Image segmentation (e.g. Felzenszwalb-Huttenlocher) — merge adjacent pixels whose colour difference is small; DSU tracks the resulting regions.
- Offline LCA queries — Tarjan's offline LCA algorithm uses DSU to answer LCA queries during a DFS in α(n) per query.
- Equivalence classes — anywhere you have an equivalence relation and need to discover the classes (e.g. type unification in a compiler's Hindley-Milner inference).
When to reach for something else
- You need to delete edges — DSU has no efficient split. Use a link-cut tree for O(log n) per op or Euler-tour trees.
- You need shortest paths within a component — DSU only tracks membership. Use BFS / Dijkstra alongside.
- Static graph, one connected-components query — a single DFS / BFS sweep is simpler and O(V + E).
Pros
- Near-constant per operation — α(n) ≤ 4 in practice. Essentially the fastest possible structure for incremental connectivity.
- Trivially small — two arrays, ~20 lines of code with both optimisations.
- Cache-friendly — the arrays are contiguous and
findis a tight pointer chase that the prefetcher handles well. - Composable — Kruskal's MST, percolation, and segmentation all sit on top with no modification.
- Easy to extend — track per-set sizes, sums, or auxiliary statistics by maintaining them at root nodes.
Cons
- No split — DSU is merge-only. Edge deletion needs link-cut trees.
- No per-set enumeration — knowing the root of a set doesn't give you its members without an extra pass.
- Path compression mutates on read —
find()is not a pure query; in a concurrent setting you need synchronization. - Amortised, not worst-case — a single operation can be slow; the bound holds across many.
Reference
Code & further reading
A minimal reference implementation and pointers worth bookmarking.
// Union-Find with union-by-rank + path compression.
// Both operations amortise to O(α(n)) — effectively constant.
class DSU {
parent: number[];
rank: number[];
constructor(n: number) {
this.parent = Array.from({ length: n }, (_, i) => i);
this.rank = new Array(n).fill(0);
}
find(x: number): number {
if (this.parent[x] !== x) {
// Path compression: point x straight at the root.
this.parent[x] = this.find(this.parent[x]);
}
return this.parent[x];
}
union(x: number, y: number): boolean {
const rx = this.find(x);
const ry = this.find(y);
if (rx === ry) return false; // already in the same set
// Union by rank: hang the shorter tree under the taller one.
if (this.rank[rx] < this.rank[ry]) this.parent[rx] = ry;
else if (this.rank[rx] > this.rank[ry]) this.parent[ry] = rx;
else { this.parent[ry] = rx; this.rank[rx]++; }
return true;
}
connected(x: number, y: number): boolean {
return this.find(x) === this.find(y);
}
}References & further reading
6 sources- Paperdl.acm.org
Robert Tarjan — Efficiency of a good but not linear set union algorithm (1975)
The proof that union-by-rank + path compression gives the inverse Ackermann bound. One of the foundational papers in data-structure analysis.
- Bookmitpress.mit.edu
CLRS — Introduction to Algorithms, §21 (Data structures for disjoint sets)
Textbook treatment with all the proofs, plus the linked-list and tree-based representations.
- Articlealgs4.cs.princeton.edu
Sedgewick & Wayne — Algorithms, §1.5 (Case study: Union-Find)
Princeton's online course covers the percolation application end-to-end. Excellent for building intuition.
- Papercs.brown.edu
Felzenszwalb & Huttenlocher — Efficient graph-based image segmentation (2004)
A widely-used image segmentation algorithm built on DSU. Shows how DSU drops cleanly into a domain very different from graph theory.
- Bookcses.fi
Competitive Programming Handbook — DSU section
Antti Laaksonen's free book. Compact DSU section with extensions for size tracking and weighted DSU.
- Articleen.wikipedia.org
Wikipedia — Disjoint-set data structure
Solid reference covering all the variants — by-size vs by-rank, halving, splitting — and the link-cut tree extension for supporting deletion.
Knowledge check
Did the prototype land?
Quick questions, answers revealed on submit. No scoring saved.
question 01 / 03
Without union-by-rank or path compression, what is the worst-case time of find(x) after n unions?
question 02 / 03
What does path compression during find(x) do?
question 03 / 03
Why is DSU the right data structure for Kruskal's MST cycle check?
0/3 answered
Was this concept helpful?
Tell us what worked, or what to improve. We read every note.