Trees
The branching data structure under databases, filesystems, autocompletes, and priority queues. Nine trees, from the plain binary search tree through the self-balancing workhorses to the disk-friendly B-trees and the range-query structures competitive programmers swear by.
What is Trees?
The 60-second primer
A tree is a graph with no cycles and exactly one path between any two nodes — that constraint buys you something powerful: a natural hierarchy. Every tree has a root, children branch from it, and the structure repeats recursively until you hit leaves. That branching shape is why trees show up everywhere: every time you need to search a sorted set, index a database, autocomplete a word, or answer 'what's the sum of this range?', a tree is almost certainly doing the work.
The nine trees here span three distinct contracts. The BST is the foundation — put smaller keys left, larger right, and you get O(log n) search on average. AVL and red-black trees fix the worst-case problem by keeping the tree balanced, guaranteeing O(log n) at all times; the red-black tree is the one you'll find in every production standard library. The heap abandons the search property entirely and optimises for one thing: finding the minimum or maximum in O(1). The trie does something completely different — it keys on characters, not comparisons, making prefix queries blazing fast. Finally, B-trees and B+ trees go wide instead of deep to match the block size of a disk, and segment and Fenwick trees are specialised range-query machines used everywhere from competitive programming to databases.
You do not need to implement all nine from scratch. You need to know what each one promises and what it costs to keep that promise. O(log n) is the whole game — and it only holds when the tree stays balanced. Once that's clear, everything else is a matter of which invariant you're willing to enforce.
Where this shows up
- Database and filesystem indexes — every B+ tree index in PostgreSQL, MySQL (InnoDB), and SQLite lets the engine jump to a row in O(log n) disk reads instead of scanning the whole table; ext4 and NTFS directory structures are B-trees too.
- Language standard libraries —
std::mapandstd::setin C++,TreeMapandTreeSetin Java, and Python'ssortedcontainersare all backed by red-black trees; you get O(log n) insert, delete, and lookup with in-order iteration for free. - Operating system schedulers — the Linux kernel's Completely Fair Scheduler (CFS) stores runnable tasks in a red-black tree keyed by virtual runtime; the scheduler always picks the leftmost node in O(log n).
- Priority queues, Dijkstra, and task schedulers — binary heaps power Python's
heapq, Java'sPriorityQueue, and every implementation of Dijkstra's algorithm; OS job queues and event loops use the same structure. - Autocomplete, spell-check, and IP routing — tries back the autocomplete in your IDE and search bar; spell-checkers do prefix and edit-distance search over a trie; internet routers do longest-prefix matching on a trie of CIDR blocks.
- Competitive programming and analytics range queries — segment trees answer 'sum / min / max over range [l, r]' with point updates in O(log n); Fenwick trees (BITs) do the same for prefix sums with half the code and half the memory.
- Compilers and expression evaluation — parse trees and abstract syntax trees are the canonical internal representation of every compiler and interpreter; expression evaluation, constant folding, and code generation all walk a tree.
Balanced vs unbalanced: why O(log n) is the whole game
A perfect binary tree of n nodes has height ⌊log₂ n⌋ — that's why balanced trees give O(log n) operations. But insert keys in sorted order into a plain BST and the tree degrades into a linked list: height becomes n, and every search is O(n). Balance is not optional — it's the difference between a data structure and a slow list. AVL trees enforce a strict height difference of at most 1 between subtrees. Red-black trees enforce a looser colour invariant that guarantees height ≤ 2 log₂(n + 1). The trade-off: AVL trees rebalance more eagerly (better for read-heavy workloads); red-black trees rebalance lazily (better for write-heavy workloads, which is why they dominate in practice).
Side-by-side
How they compare
The same concepts, on the same axes. Use this as a map; the individual pages are the territory.
| Tree | Self-balancing? | Search / Insert | Ordered traversal? | Best for |
|---|---|---|---|---|
01Binary Search Tree | No | O(log n) avg / O(n) worst | Yes (in-order) | Teaching the search property; small data sets where balance is unlikely to degrade. |
02AVL Tree | Yes — strict height balance | O(log n) guaranteed | Yes (in-order) | Read-heavy workloads where lookup speed matters more than insert overhead. |
03Red-Black Tree | Yes — colour invariant | O(log n) guaranteed | Yes (in-order) | General-purpose ordered maps and sets; the default in most standard libraries. |
04Binary Heap | N/A (not a search tree) | Search O(n) / Insert O(log n) | No | Priority queues, Dijkstra's algorithm, and any 'give me the smallest/largest fast' workload. |
05Trie (Prefix Tree) | N/A | O(L) by key length | Yes (lexicographic) | Autocomplete, spell-check, IP routing, and any query keyed on string prefixes. |
06B-Tree | Yes — wide, multi-key nodes | O(log n) disk reads | Yes (in-order) | Filesystem metadata, database indexes where every level is a disk read. |
07B+ Tree | Yes — data only in leaves | O(log n) disk reads | Yes — leaves linked for range scans | Database range queries; the dominant structure behind SQL table indexes. |
08Segment Tree | N/A (static structure) | O(log n) query + update | No | Range sum / min / max / GCD with point or range updates; competitive programming and analytics. |
09Fenwick Tree (BIT) | N/A (static structure) | O(log n) prefix query + update | No | Prefix sums with point updates; simpler and faster constant than segment trees for sums only. |
- Self-balancing?
- No
- Search / Insert
- O(log n) avg / O(n) worst
- Ordered traversal?
Yes (in-order)- Best for
- Teaching the search property; small data sets where balance is unlikely to degrade.
- Self-balancing?
- Yes — strict height balance
- Search / Insert
- O(log n) guaranteed
- Ordered traversal?
Yes (in-order)- Best for
- Read-heavy workloads where lookup speed matters more than insert overhead.
- Self-balancing?
- Yes — colour invariant
- Search / Insert
- O(log n) guaranteed
- Ordered traversal?
Yes (in-order)- Best for
- General-purpose ordered maps and sets; the default in most standard libraries.
- Self-balancing?
- N/A (not a search tree)
- Search / Insert
- Search O(n) / Insert O(log n)
- Ordered traversal?
No- Best for
- Priority queues, Dijkstra's algorithm, and any 'give me the smallest/largest fast' workload.
- Self-balancing?
- N/A
- Search / Insert
- O(L) by key length
- Ordered traversal?
Yes (lexicographic)- Best for
- Autocomplete, spell-check, IP routing, and any query keyed on string prefixes.
- Self-balancing?
- Yes — wide, multi-key nodes
- Search / Insert
- O(log n) disk reads
- Ordered traversal?
Yes (in-order)- Best for
- Filesystem metadata, database indexes where every level is a disk read.
- Self-balancing?
- Yes — data only in leaves
- Search / Insert
- O(log n) disk reads
- Ordered traversal?
Yes — leaves linked for range scans- Best for
- Database range queries; the dominant structure behind SQL table indexes.
- Self-balancing?
- N/A (static structure)
- Search / Insert
- O(log n) query + update
- Ordered traversal?
No- Best for
- Range sum / min / max / GCD with point or range updates; competitive programming and analytics.
- Self-balancing?
- N/A (static structure)
- Search / Insert
- O(log n) prefix query + update
- Ordered traversal?
No- Best for
- Prefix sums with point updates; simpler and faster constant than segment trees for sums only.
Decision guide
Which one should you use?
A practical tour of when each algorithm wins.
How to pick
- Need an ordered map or set with O(log n) insert, delete, lookup, and in-order iteration → red-black tree. It's already in your standard library (
std::map,TreeMap,SortedDict). Reach for it first. - Read-heavy workload where the tightest possible height matters → AVL tree. AVL enforces strict height balance, so lookups hit fewer levels — at the cost of more rotations on insert. Worth it only when reads dominate heavily.
- Need the minimum or maximum element in O(1), and you'll be inserting and removing frequently → binary heap. It's an array, it's cache-friendly, and it powers every priority queue and heap-based shortest-path algorithm.
- Keys are strings and queries are prefix-based → trie. Autocomplete, spell-check, IP longest-prefix matching, and phone-book style lookups all fit naturally. Each lookup is O(L) on key length, independent of how many keys are stored.
- Data lives on disk or in a block-addressable store → B+ tree. Its wide fan-out means the tree is only 3–4 levels deep even for billions of rows; every level is a disk read, so minimising height is everything. If you need range scans, B+ trees beat B-trees because the leaves form a linked list.
- You need range aggregate queries (sum, min, max, GCD) with updates → segment tree. It supports both point updates and range updates with lazy propagation in O(log n), and it can hold any associative aggregate.
- You only need prefix sums with point updates and every nanosecond counts → Fenwick tree (BIT). It's half the code and half the memory of a segment tree, with better cache behaviour. If you find yourself needing range min/max or range updates, switch to a segment tree.
The shape of the problem tells you the tree
Ask three questions: Is the key a string or a prefix? → trie. Is the data on disk? → B+ tree. Do you need range aggregates? → segment or Fenwick tree. If none of those apply and you need an ordered collection, red-black tree. If you only need the extreme element fast, heap. The BST is for learning; the others are for shipping.
Concepts in this track
9 concepts, in order
Each links to a concept page with its own explanation, prototype, and quiz.
Binary Search Tree
Smaller keys go left, larger keys go right. Every search, insert, and delete is one walk down the tree — fast when balanced, a linked list when not.
AVL Tree
A BST that refuses to lean. Track every node's balance factor and rotate the instant it hits ±2 — guaranteeing O(log n) forever.
Red-Black Tree
Looser balancing through five color rules. Fewer rotations than AVL — which is why it's the tree inside your language's standard map and the Linux kernel.
Binary Heap
A complete binary tree where every parent beats its children. Stored in a plain array, it makes the smallest (or largest) item O(1) to peek and O(log n) to pull.
Trie (Prefix Tree)
A tree keyed by characters, not whole values. Shared prefixes share a path — the structure behind autocomplete, spell-check, and IP routing tables.
B-Tree
A wide, shallow tree where each node holds many keys. Built so one disk read fetches a whole node — the index structure under most databases and filesystems.
B+ Tree
A B-tree that keeps all real data in the leaves and links those leaves in a chain — so range scans walk a list instead of re-descending the tree.
Segment Tree
A tree of ranges over an array. Answer 'sum/min/max of indices l..r' and update any element, both in O(log n) — the competitive-programmer's range engine.
Fenwick Tree (BIT)
Prefix sums with updates in O(log n) using nothing but an array and a bit trick. Smaller and simpler than a segment tree when all you need is sums.
Related tracks
If this one clicks, try these next.
Sorting
Order a list — ten ways. From the textbook swap-adjacent sorts you write in a single loop, to the divide-and-conquer giants, to the non-comparison tricks that beat O(n log n), to the hybrid your language's sort() actually uses.
Graph Algorithms
How computers reason about networks of things — roads, friends, packets, dependencies. Ten algorithms, from the two traversals every other algorithm is built on, to weighted shortest paths, minimum spanning trees, and the heuristic search behind every modern pathfinder.
Bloom & Cuckoo Filters
Three probabilistic set-membership structures that answer 'have I seen this before?' in a few bytes per item. From the classic bit-array Bloom filter to the counting variant that can delete, to the cuckoo filter that does it all with a smaller memory footprint.