Advanced12 min readlive prototype

Cuckoo Filter

Store short fingerprints in two candidate buckets and bump existing entries to their other home on collision. Delete-supporting and usually smaller than Bloom.

Overview

What this concept solves

A cuckoo filter keeps the spirit of a Bloom filter — small, probabilistic, one-sided error — but throws out the bit array. Instead, it stores short fingerprints (typically 8–16 bits each) in a hash table of buckets, where each item has two candidate buckets. On a collision, an existing fingerprint is kicked out to its other home, like a cuckoo chick evicting its nestmate. That single trick gives you safe deletion and a smaller filter than Bloom at the same false-positive rate.

Why is this such a big deal? The 2014 Fan/Andersen/Kaminsky paper showed that for any FPR below ~3%, a cuckoo filter beats a Bloom filter on every dimension that matters: less space per item, faster lookups (only two cache lines touched), no extra memory for delete support, and a clean removal that doesn't leak error over time. In most new designs that need a probabilistic membership filter, cuckoo is now the default — Bloom is reserved for the cases where you literally never delete and want the simplest possible structure.

There is one catch: a cuckoo filter can refuse an insert when it gets very full, because the kick chain has nowhere to land. Bloom and counting Bloom never refuse — they just gradually get worse. This is the engineering trade you accept for the smaller footprint: you size for ~95% load factor, monitor occupancy, and either accept the (rare) insert failure or resize to a bigger filter.

Mechanics

How it works

The structure

  • A table of buckets, each holding a small fixed number of slots (typically 4 in production; the prototype uses 2 for clarity).
  • A short fingerprint function fp(x) that hashes the full item down to f bits (commonly 8–16). The fingerprint, not the item, is what gets stored.
  • Two candidate buckets per item, computed with a partial-key cuckoo hashing trick: i₁ = hash(x) mod m and i₂ = i₁ ⊕ hash(fp(x)). The XOR is the magic — it means you can compute i₁ from i₂ and the fingerprint alone, without needing the original item. That's what enables deletion.

Adding an item

  1. Compute fp = fingerprint(x), i₁ = hash(x) mod m, i₂ = i₁ ⊕ hash(fp) mod m.
  2. If bucket i₁ has a free slot, drop the fingerprint in. Done.
  3. Else if bucket i₂ has a free slot, drop it there. Done.
  4. Otherwise pick a random slot in i₁ or i₂, kick the existing fingerprint out, and try to relocate the kicked fingerprint to its alternate bucket. Repeat until something lands or you hit a kick limit (commonly 500).
  5. Hitting the kick limit means the table is too full — the insert fails and you must rebuild larger. In practice this happens above ~95% load.

Checking an item

  1. Compute fp, i₁, i₂.
  2. Look in bucket i₁ and bucket i₂. If the fingerprint is in either, return maybe-yes — another item may share the same fingerprint and bucket (the false-positive case).
  3. If neither bucket contains the fingerprint, return definitely-no.

Removing an item

  1. Compute fp, i₁, i₂.
  2. If fp is in bucket i₁ or i₂, simply delete it from that slot. Clean removal — no counters, no shared-bit ambiguity.
  3. Caveat: like any filter with false positives, you may accidentally delete a fingerprint that 'belonged' to a different item. The same one-sided error that allows false positives means deletion has a tiny probability of removing the wrong entry, so cuckoo filters assume you only ever remove items you know you inserted (which is the same caveat counting Bloom has, just rarely stated).

Why the XOR alternate is so clever

Standard cuckoo hashing needs to rehash the full item to find an alternate location. That's a problem for a filter, because we threw the item away and only kept the fingerprint. The XOR trick — i₂ = i₁ ⊕ hash(fp) — is self-inverting: knowing the bucket you're in and the fingerprint you're holding, you can compute its other home without ever seeing the original. That's what makes both delete and kick-chain rebalancing possible inside a filter.

Inserts can fail

Unlike Bloom and counting Bloom, a cuckoo filter can refuse to accept a new item when it gets very full — the kick chain runs to its limit without finding a free slot. The standard remedy is to size for the target load (≈95% with 4-slot buckets) and resize before you saturate. The flip side is that lookups stay fast and the FPR stays exactly what you tuned, instead of slowly degrading like a Bloom filter does as it fills.

Interactive prototype

Run it. Break it. Tune it.

Sandboxed simulation embedded right in the page. No setup, no install.

About this simulation

Eight buckets of two slots each. Every word gets a short fingerprint and two candidate buckets — try bucket A, fall back to bucket B, then start kicking. Step through Add & check, Backup bucket, or Cuckoo eviction — or open Free play and watch the kick chain animate when both homes are full.

Hands-on

Try these on your own

Open the prototype above, run each experiment, predict the answer, then verify.

try 01

Walk Add & check

Open Add & check and step through. apple gets a fingerprint and two candidate buckets — bucket A is empty, so it goes there. Then a check for apple finds its fingerprint in bucket A (maybe-yes), and a check for banana finds neither bucket holds its fingerprint (definitely-no).

try 02

Backup bucket — the cheap fallback

Switch to Backup bucket. Two words happen to map to the same primary bucket; the second one finds bucket A full but bucket B free, and lands there. This is the no-kick case — cuckoo's first move is always 'is there room in either home?' Kicks only fire when both are full.

try 03

Cuckoo eviction — watch the kick chain

Step through Cuckoo eviction. The filter is already loaded; the new word's two buckets are both full. Watch a victim fingerprint get kicked out to its alternate bucket, which displaces another fingerprint, which goes to its alternate, until something lands in a free slot. The chain animates one step at a time so you can follow each move.

try 04

Free play — add, remove, repeat

Open Free play. Add half a dozen short words and watch the buckets fill. Try Remove on one — the fingerprint disappears cleanly from whichever bucket holds it. Keep adding past ~14 items and you'll eventually see the 'too full' message: the kick chain ran to its limit. Reset to clear and try again.

In practice

When to use it — and what you give up

When it's the right tool

  • You need delete-supporting membership and want the smallest filter you can get — cuckoo beats counting Bloom on memory at every realistic FPR.
  • You need predictable lookup latency — only two bucket reads, both cache lines you can prefetch in parallel. Bloom touches k = 7+ cells; cuckoo touches 2.
  • Your set churns — keys come and go often, and you don't want to rebuild the filter periodically to reclaim space.
  • You're building a new system in 2026 and just need a membership filter that works for both inserts and deletes — cuckoo is the modern default.

When something else is better

  • You only ever add → a [Bloom filter](/topics/bloom-filters/bloom) is smaller and simpler. Don't pay for delete support you don't use.
  • You can't tolerate insert failures → counting Bloom never refuses an insert; it just degrades. For a denylist that must accept new entries forever, that may matter.
  • You need very high FPRs (> 3%) → in that regime Bloom can actually be smaller per item; cuckoo's advantage kicks in at low FPRs (the common case).
  • You need exact membership → use a hash set; the probabilistic-filter family is the wrong tool.

Pros

  • Supports clean delete — pull the fingerprint out of a slot; no shared-counter accounting.
  • Smaller than Bloom at realistic FPRs — typical configurations use ~7 bits per item vs Bloom's ~9.6 bits at 1% FPR.
  • Only two cache lines per lookup — Bloom touches k of them. Critical for high-QPS use.
  • FPR stays constant as the filter fills, instead of degrading like Bloom does.
  • Production-mature — used in CockroachDB, ScyllaDB, libsegfault, the dnscrypt-proxy denylist, and many CDN caches.

Cons

  • Inserts can fail once the table is very full — requires a resize path and good monitoring.
  • More implementation complexity — kick logic, alternate-bucket XOR, randomised eviction, retry limits. Bloom is half the code.
  • Fingerprint collisions still cause false positives — at the same FPR the math is different from Bloom but the kind of error is the same.
  • Worst-case insert is O(kick limit) — most inserts are O(1), but a near-full filter can require hundreds of kicks for one insert.

Reference

Code & further reading

A minimal reference implementation and pointers worth bookmarking.

cuckoo-filter.ts
// Cuckoo filter — fingerprints, two candidate buckets, kick chain.
type Fingerprint = number;            // typically 8–16 bits

class CuckooFilter {
  private readonly buckets: Fingerprint[][];

  constructor(
    private readonly numBuckets: number,
    private readonly slotsPerBucket = 4,
    private readonly maxKicks = 500,
  ) {
    this.buckets = Array.from({ length: numBuckets }, () => []);
  }

  add(item: string): boolean {
    const fp = this.fingerprint(item);
    const i1 = this.hash(item) % this.numBuckets;
    const i2 = (i1 ^ this.hash(String(fp))) % this.numBuckets;

    if (this.tryPlace(i1, fp)) return true;
    if (this.tryPlace(i2, fp)) return true;

    // Both buckets full — start the cuckoo dance.
    let idx = Math.random() < 0.5 ? i1 : i2;
    let cur: Fingerprint = fp;
    for (let n = 0; n < this.maxKicks; n++) {
      const slot = Math.floor(Math.random() * this.slotsPerBucket);
      const evicted = this.buckets[idx][slot];
      this.buckets[idx][slot] = cur;
      cur = evicted;
      idx = (idx ^ this.hash(String(cur))) % this.numBuckets;
      if (this.tryPlace(idx, cur)) return true;
    }
    return false;        // filter is too full — caller should resize
  }

  has(item: string): boolean {
    const fp = this.fingerprint(item);
    const i1 = this.hash(item) % this.numBuckets;
    const i2 = (i1 ^ this.hash(String(fp))) % this.numBuckets;
    return this.buckets[i1].includes(fp) || this.buckets[i2].includes(fp);
  }

  remove(item: string): boolean {
    const fp = this.fingerprint(item);
    const i1 = this.hash(item) % this.numBuckets;
    const i2 = (i1 ^ this.hash(String(fp))) % this.numBuckets;
    return this.removeFp(i1, fp) || this.removeFp(i2, fp);
  }

  private tryPlace(idx: number, fp: Fingerprint): boolean {
    if (this.buckets[idx].length < this.slotsPerBucket) {
      this.buckets[idx].push(fp);
      return true;
    }
    return false;
  }
  private removeFp(idx: number, fp: Fingerprint): boolean {
    const i = this.buckets[idx].indexOf(fp);
    if (i === -1) return false;
    this.buckets[idx].splice(i, 1);
    return true;
  }
  private fingerprint(x: string): Fingerprint { return hash(x, 'fp') & 0xff; }
  private hash(x: string, seed = ''): number { return murmur3(x + seed); }
}

References & further reading

6 sources

Knowledge check

Did the prototype land?

Quick questions, answers revealed on submit. No scoring saved.

question 01 / 03

What does a cuckoo filter actually store in each bucket slot?

question 02 / 03

Why is the XOR trick `i₂ = i₁ ⊕ hash(fingerprint)` essential to cuckoo filters?

question 03 / 03

Compared to a Bloom filter at the same false-positive rate, what's a cuckoo filter's main advantage besides supporting deletion?

0/3 answered

Was this concept helpful?

Tell us what worked, or what to improve. We read every note.