Overview
What this concept solves
A B-tree is a balanced multi-way search tree designed around one constraint: one node must fit in one disk block. On a spinning hard drive or an SSD, the cost of seeking to a page and reading it into RAM dwarfs arithmetic by a factor of 10,000 or more. Binary search trees minimize comparisons — but comparisons are cheap. What's expensive is the number of pages you must fetch. A node with one key forces you to visit O(log₂ n) pages. A node with 200 keys keeps the same sorted order but collapses the tree to O(log₂₀₀ n) pages — fewer by a factor of 7–8 at database scale.
The mechanics flow from a single parameter: *order m**, also called the branching factor. A node can hold at most m − 1 keys and at most m children. Every internal node (except the root) must hold at least ⌈m/2⌉ − 1 keys — this lower bound is what keeps the tree balanced after deletions, preventing any branch from becoming sparse. All leaves sit at the same depth. The result is a wide, shallow* tree: height grows as log_m(n), so a real database index with m ≈ 200 reaches every one of a billion records in four or five page reads.
Rudolf Bayer and Edward McCreight invented the B-tree at Boeing Research Labs in 1970 (published 1972) specifically to manage large sorted files on block-addressed storage. The core insight has never become obsolete. Every major relational database — PostgreSQL, MySQL, Oracle, SQL Server, SQLite — uses B-trees (or the leaf-linked variant, B+ trees) as the default index structure. Filesystems (NTFS, HFS+, ext4's htree) use them for directory lookups. The data structure is now over 50 years old and remains the dominant on-disk search structure because the disk-read bottleneck it was designed to solve is as real today as it was in 1970.
Mechanics
How it works
Searching a B-tree
- Start at the root node. Scan its keys left-to-right (or binary-search them) to find the first key ≥ your target.
- If the target equals that key, you're done — the record is here (or a pointer to it is).
- If the target is less than that key, follow the child pointer to the left of it. If you've passed all keys, follow the rightmost child pointer.
- Repeat at the child node. Each level eliminates all but one subtree.
- If you reach a leaf and the key is absent, the target is not in the tree.
Insertion and split-on-overflow
- Descend to the correct leaf using the same search logic — follow child pointers at each level until you hit a leaf.
- Insert the new key into the leaf in sorted order.
- If the leaf now has m keys (one too many), split it: the middle key is promoted to the parent; the remaining keys are divided left and right into two new sibling nodes.
- The parent has gained a key. If the parent is now also full, split it too — the middle key of the parent promotes to its parent. This can cascade upward.
- If the root splits, a new root is created holding just the single promoted key and two children. This is the only way a B-tree grows taller — trees always grow at the root, never at the leaves.
- All leaves remain at the same depth after every insertion.
Why fewer levels = fewer disk seeks
Each level of the tree is one page fetch. A binary search tree over 1 million records has depth ~20; an order-200 B-tree has depth ~3. That's the difference between 20 disk reads and 3. On spinning storage at 10 ms per seek, 20 seeks costs 200 ms; 3 seeks costs 30 ms — a 6× wall-clock difference on a single lookup, compounding across every query a database processes.
The minimum-keys rule
Every non-root internal node must hold at least ⌈m/2⌉ − 1 keys. For order 4, that's 1 key minimum, 3 keys maximum. This half-full guarantee bounds the tree's height from both directions: nodes can't be too sparse (tree stays shallow) and can't overflow (splits keep capacity bounded). The root is exempt — it can hold as few as 1 key and 2 children, which is what it looks like right after a root split.
Interactive prototype
Run it. Break it. Tune it.
Sandboxed simulation embedded right in the page. No setup, no install.
About this simulation
An order-4 B-tree — each node holds up to 3 keys and up to 4 children. Type a number and press Insert, or hit Random to pick one, then watch the tree respond. When a node fills up it flashes red, the middle key is promoted to the parent, and the node splits in two — try Sequential 10→80 to see this cascade repeatedly and the root itself split. Use Search to trace a lookup from root to leaf; the stat bar tracks total keys, current height, live nodes, and splits performed. The Speed slider controls animation pace.
Hands-on
Try these on your own
Open the prototype above, run each experiment, predict the answer, then verify.
Watch repeated splits and root growth
Press Sequential 10→80 and let it run at default speed. Count the red flashes — each one is a node overflowing and splitting. Notice that once the root splits, the tree gains a new level: height jumps from 1 to 2, then later from 2 to 3. In an order-4 tree, the root first splits when the 4th key is inserted; subsequent root splits take progressively longer because more leaf splits must cascade up first. Try pausing with the Speed slider near zero to catch each promotion in slow motion.
Explore the mixed-insert pattern
Hit Reset then press Mixed inserts. This sequence interleaves small and large values so splits land at different internal nodes rather than cascading straight up the right spine. Watch how the tree's shape differs from the sequential case — the key distribution is more even, splits are more spread out, and the height grows more slowly. When the demo finishes, compare the splits counter to the sequential run: same number of insertions, but fewer root splits because internal nodes absorb more promotions before filling.
Search and count node visits
After running either demo, type a key that you saw inserted (e.g. 45) into the input and press Search. Watch the prototype highlight each node it visits, root → internal → leaf. Count them — that number is the height of the tree, and it equals the number of disk-page reads a real database would issue. Now search for a key that was never inserted (e.g. 99). The traversal still visits the same number of nodes before concluding the key is absent: B-tree search cost is always O(height), hit or miss.
Free play — break it yourself
Hit Reset and insert keys in a hand-crafted order. Try to force a root split with as few insertions as possible (hint: 4 insertions suffice in order 4). Then insert a key that lands in the same leaf as an existing key and confirm the split still produces a valid sorted tree. Finally, use Random repeatedly and watch whether the tree ever becomes unbalanced — it won't, and that's the point. The B-tree's invariants guarantee balance regardless of insertion order, which is exactly the guarantee an unbalanced BST cannot give.
In practice
When to use it — and what you give up
When a B-tree is the right structure
- Database indexes — any SQL
CREATE INDEXdefaults to a B-tree. Range queries (WHERE age BETWEEN 20 AND 30), equality lookups, andORDER BYall use the sorted leaf layer efficiently. - Filesystem directories — large directories (thousands of entries) in NTFS, HFS+, and ext4's
dir_indexfeature are B-trees over filename keys, giving O(log n) lookup instead of O(n) linear scan. - Key-value stores on disk — RocksDB, LevelDB, and WiredTiger all use variants of B-trees or LSM trees where the sorted-page property is central.
- Any sorted dataset larger than RAM — if the data fits in memory, a red-black tree or skip list is simpler; once the dataset exceeds RAM, the page-aligned B-tree wins on I/O.
- Ordered iteration — B-trees (especially B+ trees with linked leaves) support efficient
SELECT ... ORDER BYand range scans without a full sort.
When to reach for something else
- In-memory data, small dataset — a red-black tree (used in
std::map, JavaTreeMap) or AVL tree has lower per-operation constant overhead when every node is already in cache. - Hash-only lookups, no ranges needed — a hash table gives O(1) exact-match lookup vs O(log_m n) for B-tree; use it when you never need range queries or sorted order.
- Write-heavy workloads needing maximum throughput — LSM trees (Log-Structured Merge trees, as in RocksDB) batch writes in memory and flush sequentially, avoiding random-write amplification that B-trees can suffer on SSDs.
- Multidimensional spatial queries — a B-tree indexes one dimension. Spatial queries (
points within a rectangle) need an R-tree, k-d tree, or GiST index instead.
Pros
- Logarithmic height on the branching factor — O(log_m n) pages per lookup; at m = 200, a billion-record index is only 4–5 levels deep.
- Guaranteed balance — every leaf is at the same depth after every operation. No rebalancing heuristics, no worst-case degeneration like an unbalanced BST.
- Efficient range scans — keys within a node are sorted; in a B+ tree, leaves are linked so a range query becomes one root-to-leaf descent then a sequential leaf scan.
- Tunable to hardware — setting the order so each node fills exactly one disk block (typically 4 KB or 16 KB) aligns I/O to the storage unit, eliminating partial-page waste.
- Half-century of production hardening — the structure and its deletion/merge algorithms are implemented, tested, and understood in every major database engine.
Cons
- Split cascades add write amplification — a single insert can cause splits at every level up to the root, writing O(log_m n) pages instead of one.
- Complex implementation — split, promote, merge-on-underflow, and borrow-from-sibling cover many edge cases. A correct, concurrent B-tree (with page latching) is a weeks-long engineering effort.
- Poor locality for random writes on SSDs — SSD pages must be erased in large blocks; random in-place overwrites during splits cause write amplification at the firmware level, which LSM trees avoid.
- Space overhead — the minimum-keys rule means pages are at most half-empty after a sequence of deletions, wasting up to 50 % of allocated space until a vacuum/reorganize pass.
- Not cache-friendly for tiny datasets — for data that fits in L2/L3 cache, a sorted array with binary search often outperforms a B-tree due to lower pointer-chasing overhead.
Reference
Code & further reading
A minimal reference implementation and pointers worth bookmarking.
// Minimal B-tree: search + insert-with-split. Order is configurable.
// "Order m" means: max m children, max m-1 keys, min ⌈m/2⌉-1 keys (non-root).
const ORDER = 4; // prototype uses order 4 (max 3 keys/node)
const MAX_KEYS = ORDER - 1; // 3
const MIN_KEYS = Math.ceil(ORDER / 2) - 1; // 1
interface BNode {
keys: number[];
children: BNode[];
isLeaf: boolean;
}
function makeNode(isLeaf: boolean): BNode {
return { keys: [], children: [], isLeaf };
}
// --- Search ---
// Returns true if key exists anywhere in the subtree rooted at node.
function search(node: BNode, key: number): boolean {
let i = 0;
while (i < node.keys.length && key > node.keys[i]) i++;
if (i < node.keys.length && key === node.keys[i]) return true; // found
if (node.isLeaf) return false; // not here
return search(node.children[i], key); // descend
}
// --- Insert (non-full root entry point) ---
let root: BNode = makeNode(true);
function insert(key: number): void {
if (root.keys.length === MAX_KEYS) {
// Root is full — split it and grow the tree upward.
const oldRoot = root;
root = makeNode(false);
root.children.push(oldRoot);
splitChild(root, 0); // split the only child (oldRoot)
}
insertNonFull(root, key);
}
// Precondition: node has room for at least one more key.
function insertNonFull(node: BNode, key: number): void {
let i = node.keys.length - 1;
if (node.isLeaf) {
node.keys.push(0); // make room
while (i >= 0 && key < node.keys[i]) {
node.keys[i + 1] = node.keys[i];
i--;
}
node.keys[i + 1] = key;
} else {
while (i >= 0 && key < node.keys[i]) i--;
i++;
if (node.children[i].keys.length === MAX_KEYS) {
splitChild(node, i);
if (key > node.keys[i]) i++;
}
insertNonFull(node.children[i], key);
}
}
// Split the i-th child of parent (which must be full).
// Middle key is promoted into parent; two half-nodes remain.
function splitChild(parent: BNode, i: number): void {
const full = parent.children[i];
const mid = Math.floor(MAX_KEYS / 2); // index of the promoted key
const right = makeNode(full.isLeaf);
right.keys = full.keys.splice(mid + 1); // right half
const promoted = full.keys.splice(mid)[0]; // middle key
if (!full.isLeaf) {
right.children = full.children.splice(mid + 1);
}
parent.keys.splice(i, 0, promoted); // promote
parent.children.splice(i + 1, 0, right); // link right sibling
}References & further reading
7 sources- Paperlink.springer.com
Bayer & McCreight — Organization and Maintenance of Large Ordered Indexes (1972)
The original paper in Acta Informatica that introduced B-trees. Dense and mathematical, but the problem statement on pages 1–3 explains the disk-block motivation as clearly as anything written since.
- Paperdl.acm.org
Comer — The Ubiquitous B-Tree, ACM Computing Surveys (1979)
The definitive survey: search, insert, delete, B+ tree variant, and concurrency — all in one readable 23-page paper. Still the best single reference for understanding B-tree algorithms fully.
- Bookmitpress.mit.edu
CLRS — Introduction to Algorithms, Chapter 18: B-Trees
Textbook-rigorous treatment of search, insert, and delete with loop invariants and worked examples. Chapter 18 is self-contained and the pseudocode maps cleanly to any language.
- Articleen.wikipedia.org
Wikipedia — B-tree
Good structural overview with diagrams for split and merge operations, a comparison of B-tree variants (B+ tree, B* tree), and a survey of real-world uses in databases and filesystems.
- Articleuse-the-index-luke.com
Use The Index, Luke — The Anatomy of an Index
A practitioner's guide to how database B-tree indexes actually work under SQL queries — covers leaf-node doubly-linked lists, clustered vs non-clustered indexes, and why
LIKE '%foo'can't use a B-tree. - Docssqlite.org
SQLite — Database File Format (B-tree pages)
The SQLite file format spec documents exactly how table and index B-trees are laid out on disk: page headers, cell arrays, overflow pages. Reading this makes the one-node-per-page design concrete.
- Docspostgresql.org
PostgreSQL docs — Index Types: B-Tree
Documents which operators PostgreSQL's B-tree indexes accelerate, when the planner chooses them, and the storage parameters (
fillfactor) that control how full pages are kept to reduce split frequency.
Knowledge check
Did the prototype land?
Quick questions, answers revealed on submit. No scoring saved.
question 01 / 03
In an order-4 B-tree, a leaf node currently holds the keys [10, 20, 30]. You insert 25. What happens?
question 02 / 03
A B-tree with order m over n records guarantees a search visits at most how many nodes (disk pages)?
question 03 / 03
B-trees always grow upward — the root splits rather than leaves being pushed down. What consequence does this have?
0/3 answered
Was this concept helpful?
Tell us what worked, or what to improve. We read every note.