Intermediate

Consistent Hashing

Map keys to servers so that adding or removing a server moves as few keys as possible. Five methods, from the classic hash ring to the table-based hashing inside modern network load balancers.

distributedscalingpatterns

What is Consistent Hashing?

The 60-second primer

Consistent hashing is a way to map keys to servers so that adding or removing a server moves as few keys as possible. You have a set of keys — cache entries, user sessions, database rows — and a pool of servers to spread them over. The job is to decide which server owns each key, and to keep that decision stable when the pool changes.

The obvious approach is server = hash(key) % N. It works perfectly until N changes. Add one server — go from 4 to 5 — and the modulo result flips for almost every key. Nearly your entire dataset has to move at once: caches go cold, sessions get lost, and the system thrashes. The whole point of consistent hashing is to avoid that cliff.

The classic trick, from Karger's 1997 paper, is to place both servers and keys on a circle — a hash ring. A key belongs to the first server you meet walking clockwise. Now adding or removing a server only disturbs the slice of the ring next to it; on average just K/N keys move instead of all of them. Everything in this topic is a variation on that one idea: keep placement stable, share load evenly, and look it up fast.

Where you actually need it

  • Distributed caches — Memcached and CDN edge nodes use it so that resizing the fleet doesn't flush every cache at once. This was the original motivating use case.
  • Sharded databases & key-value stores — Dynamo, Cassandra, Riak, and ScyllaDB partition data across nodes with a hash ring so a node joining or leaving only reshuffles its neighbours' data.
  • Sticky load balancing — pin a client or session to a backend by hashing its identity, without a shared session store. (This is the bridge from the Load Balancing topic's IP Hash.)
  • Network load balancers — Google's Maglev and similar systems hash connections to backends so packets in one connection keep landing on the same server even as backends come and go.
  • Any time rehashing is expensive — if moving a key means copying data, warming a cache, or migrating a partition, you want the minimum movement when membership changes.

The one number that matters

With naive hash % N, changing the server count moves ~100% of keys. With consistent hashing, it moves about 1/N of them. That single property — minimal disruption on membership change — is why every large distributed store reaches for it.

Side-by-side

How they compare

The same concepts, on the same axes. Use this as a map; the individual pages are the territory.

01Vanilla Ring
Core idea
Servers & keys on a circle; walk clockwise
Load balance
Uneven — random gaps cause hot spots
Lookup cost
O(log N) binary search
Best for
Understanding the core idea
02Virtual Nodes
Core idea
Each server placed at V points on the ring
Load balance
Even — smooths out as V grows
Lookup cost
O(log(N·V))
Best for
Real ring-based stores (Dynamo, Cassandra)
03Rendezvous (HRW)
Core idea
Score each server for the key, pick the highest
Load balance
Even, no virtual nodes needed
Lookup cost
O(N) — score every server
Best for
Small clusters, easy weighting
04Jump Hash
Core idea
A formula maps key + count → bucket
Load balance
Near-perfect, zero memory
Lookup cost
O(log N), no table
Best for
Numbered shards that grow at the tail
05Maglev
Core idea
Precompute a lookup table of backends
Load balance
Very even by construction
Lookup cost
O(1) table lookup
Best for
Fast network load balancers

Decision guide

Which one should you use?

A practical tour of when each algorithm wins.

Decision guide

  • Learning the concept, or a simple ringVanilla Ring. It's the foundation; understand it before anything else.
  • A production ring-based storeVirtual Nodes. The same ring, but each server sits at many points so load is even and capacity is tunable. This is what Dynamo, Cassandra, and Riak actually ship.
  • A small number of servers, and you want weighting without the bookkeeping of vnodesRendezvous (HRW). It scores every server per key, so it's O(N), but it's simple and balances beautifully.
  • Shards numbered 0…N-1 that only ever grow at the endJump Hash. No memory, no table, near-perfect balance — but you can't remove an arbitrary node, only the last one.
  • A high-throughput network load balancerMaglev. It builds a lookup table up front for O(1) routing and minimal disruption when backends change.

Start at the ring

Almost every method here is the vanilla ring with one problem fixed. Virtual nodes fix uneven load; rendezvous removes the ring; jump hash removes the memory; Maglev makes lookup O(1). Learn the ring first and the rest fall into place.

Related tracks

If this one clicks, try these next.