Advanced11 min readlive prototype

Timsort

Find runs already in order, pad short ones with insertion sort, then merge runs in stack order. The hybrid your language's sort() actually ships.

Overview

What this concept solves

Timsort is what your language's sort() actually does. Python, Java (for objects), JavaScript V8, Rust's slice::sort, Android, Swift — all of them. Tim Peters designed it in 2002 for CPython after a frustrated weekend looking at how badly classical sorts handle real-world data. Real data is rarely random: server logs are mostly in timestamp order with a few late arrivals; merged feeds are sorted runs interleaved; even shuffled playlists have patterns. Timsort exploits those patterns to run in O(n) on the best cases while staying O(n log n) on the worst.

The algorithm is merge sort with two upgrades. First, it doesn't blindly split in half — it scans the input for natural runs (stretches already in ascending order) and uses those as the merge starting point. Already-sorted input is one giant run; you skip the merge phase entirely and finish in O(n). Second, when runs are too short (< minrun ≈ 32), Timsort extends them with binary insertion sort because insertion is the fastest sort on tiny inputs.

There's a third subtlety: Timsort doesn't merge runs in input order. It pushes runs onto a stack, then triggers merges when the top three runs satisfy specific size invariants (the 'runs' rules). This is what keeps the total merge work O(n log n) regardless of how the runs are distributed — and is also the source of a 2015 paper that found Timsort had a latent stack-overflow bug in its merge invariants. (Java and Python both patched it.) Real engineering, real bugs, real impact.

Mechanics

How it works

Phase 1 — find and extend runs

  1. Scan left to right. Find the longest ascending stretch starting at the current position (a run).
  2. If the run's length is less than minrun (typically 32, computed from n), extend it: take the next minrun - len elements and binary-insertion-sort them into the existing run.
  3. Push the resulting run (start, length) onto the run stack.
  4. Continue scanning until the input is consumed.

Phase 2 — stack-driven merging

  1. After each new run is pushed, check the top three runs A, B, C (A is on top). Enforce the invariants |C| > |B| + |A| and |B| > |A|.
  2. If either is violated, merge the smaller of (A, C) with B and repeat until the invariants hold.
  3. Once the input is fully scanned, merge whatever runs remain on the stack from top to bottom.
  4. Use galloping mode during merges — when one side is consistently winning, binary-search ahead to skip O(log) comparisons at a time.

Best case is O(n) — sorted input runs in one pass

If the input is already sorted, Phase 1 finds one run of length n and pushes it. There's nothing to merge. Total work: one scan, O(n). Pure merge sort would still split-and-merge O(n log n). That's why your language's sort() feels instant on already-sorted data — Timsort detects it.

Galloping

During a merge, if one side wins many comparisons in a row, Timsort assumes that side is going to keep winning and uses binary search to skip ahead to where the comparison flips. On highly-clumped data (like merging two sorted runs that don't interleave much) galloping cuts comparisons from O(n) to O(log n) per chunk — sometimes orders of magnitude faster on real workloads.

The 2015 invariant bug

Stijn de Gouw and colleagues formally verified Timsort and discovered the stack-invariant check was missing a case — a pathological input could leave the run stack too deep, eventually overflowing. Python and Java both shipped patches. A solid reminder that complex, optimised algorithms can hide bugs no test suite finds.

Interactive prototype

Run it. Break it. Tune it.

Sandboxed simulation embedded right in the page. No setup, no install.

About this simulation

Phase 1 finds natural ascending runs in the data (each colour band is one run); short runs get extended with insertion sort. Phase 2 merges adjacent runs until one sorted run remains. Press Play to watch the full sort, Step / Back to walk each step, Runny data to seed an array with pre-existing runs, Shuffle for fully-random data.

Hands-on

Try these on your own

Open the prototype above, run each experiment, predict the answer, then verify.

try 01

Run a fully-random sort

Press Shuffle to get a random 12-element list, then Play. Watch Phase 1 detect runs — each colour band is one ascending run found in the data. Short runs get extended with insertion (purple). Then Phase 2 merges adjacent runs until one sorted run remains. Runs found shows how many natural runs the data contained.

try 02

See the best case on runny data

Press Runny data to seed an input with three obvious ascending stretches [22,38,55, 18,29,41,60, 25,33, 70,82,95]. Step through Phase 1: Timsort finds the runs without doing any extra work. The merge phase is fast — only a few merges. Compare this to Comparisons on random data: runny data finishes in roughly half the operations.

try 03

Watch the merge stack collapse

During Phase 2, each step merges two adjacent runs into one. The total number of merges equals the number of runs minus one. On the runny-data seed (3 runs), expect 2 merges. On random data of length 12, expect closer to 6 merges. Merges done counts them live.

try 04

Step into a merge

Use Step during a merge. The orange band marks the range being merged. Each step compares the fronts of the two runs and picks the smaller. After the merge, the orange band becomes one sorted green span — that's one merged run living in the run stack.

In practice

When to use it — and what you give up

When it's the right tool

  • Anything where you'd otherwise use a comparison sort — Timsort is strictly better than mergesort and tied or better than quicksort on most real-world data.
  • Data with existing partial order — log files, merged feeds, append-only data with occasional re-sorts.
  • You need a stable sort with O(n log n) worst-case guarantees — Timsort gives you both.
  • Sorting records by multiple keys — stability lets you sort by primary then secondary in two passes.

When to reach for something else

  • You're sorting primitive ints at huge scale — Java's Arrays.sort uses dual-pivot quicksort for primitives because the cache wins outweigh stability. For 10⁹ integers, radix beats both.
  • Memory is extremely tight — Timsort needs O(n) auxiliary space; heap sort doesn't.
  • You need to teach divide-and-conquer cleanly — Timsort is several optimisations stacked on top of merge sort. Teach merge first.
  • You're sorting integers in a known small range — counting sort wins.

Pros

  • O(n) best case on already-sorted or reverse-sorted data — beats every other comparison sort on common inputs.
  • O(n log n) worst case — no adversarial input.
  • Stable — equal elements keep their order.
  • Optimised for real-world data — runs, galloping, the merge-stack invariants all target patterns that random-data benchmarks miss.
  • Battle-tested at planetary scale — Python, Java, V8, Android, Rust all ship Timsort.

Cons

  • O(n) extra memory for the merge buffer — same as merge sort.
  • Complex to implement correctly — the run-stack invariants are subtle (and were buggy until 2015).
  • Slower than quicksort on uniformly random data — by a small constant factor, because the run detection isn't free.
  • Not the right teaching algorithm — its strength is in the optimisations, which require understanding merge sort first.

Reference

Code & further reading

A minimal reference implementation and pointers worth bookmarking.

timsort.ts
// A simplified Timsort sketch: find runs, extend short ones, merge.
// Real Timsort adds galloping and the full merge-stack invariants.
const MINRUN = 32;

function timsort<T>(a: T[]): T[] {
  const n = a.length;
  // Phase 1: detect runs, extending short ones with insertion sort.
  const runs: Array<[number, number]> = []; // [start, length] pairs.
  let i = 0;
  while (i < n) {
    let runEnd = i + 1;
    while (runEnd < n && a[runEnd] >= a[runEnd - 1]) runEnd++;
    let runLen = runEnd - i;
    if (runLen < MINRUN) {
      const extendTo = Math.min(i + MINRUN, n);
      insertionSortRange(a, i, extendTo);
      runLen = extendTo - i;
    }
    runs.push([i, runLen]);
    i += runLen;
  }
  // Phase 2: merge runs in order (simplified — real Timsort uses a stack).
  while (runs.length > 1) {
    const [aStart, aLen] = runs.shift()!;
    const [, bLen] = runs.shift()!;
    mergeInPlace(a, aStart, aStart + aLen, aStart + aLen + bLen);
    runs.unshift([aStart, aLen + bLen]);
  }
  return a;
}

function insertionSortRange<T>(a: T[], lo: number, hi: number): void {
  for (let i = lo + 1; i < hi; i++) {
    const key = a[i];
    let j = i - 1;
    while (j >= lo && a[j] > key) { a[j + 1] = a[j]; j--; }
    a[j + 1] = key;
  }
}

function mergeInPlace<T>(a: T[], lo: number, mid: number, hi: number): void {
  const left = a.slice(lo, mid), right = a.slice(mid, hi);
  let i = 0, j = 0, k = lo;
  while (i < left.length && j < right.length) {
    a[k++] = left[i] <= right[j] ? left[i++] : right[j++];
  }
  while (i < left.length) a[k++] = left[i++];
  while (j < right.length) a[k++] = right[j++];
}

timsort([5, 21, 7, 23, 19]); // → [5, 7, 19, 21, 23]

References & further reading

6 sources

Knowledge check

Did the prototype land?

Quick questions, answers revealed on submit. No scoring saved.

question 01 / 03

What's Timsort's best-case time complexity, and on what input?

question 02 / 03

Why does Timsort use insertion sort to extend short runs instead of just merging them as-is?

question 03 / 03

Why does Timsort merge runs in stack order rather than left-to-right?

0/3 answered

Was this concept helpful?

Tell us what worked, or what to improve. We read every note.