Intermediate

Bloom & Cuckoo Filters

Three probabilistic set-membership structures that answer 'have I seen this before?' in a few bytes per item. From the classic bit-array Bloom filter to the counting variant that can delete, to the cuckoo filter that does it all with a smaller memory footprint.

data-structuresmemoryprobabilistic

Start with Bloom Filter

What is Bloom & Cuckoo Filters?

The 60-second primer

A probabilistic filter answers 'have I seen this before?' in a handful of bytes per item. A hash set works too, but it has to store every key — fine for thousands, painful for billions. A Bloom filter, a counting Bloom filter, or a cuckoo filter give up certainty in one direction and get back orders of magnitude less memory. You'll see them in browsers (Safari's bad-URL list), databases (Cassandra/HBase/RocksDB skip disk reads), CDNs (cached-or-not), and every CRDB-style system that has to ask 'do I have this row?' over the network.

The trick is the kind of error they allow. All three are one-sided: a positive answer is maybe, but a negative answer is definitely not. That asymmetry is the whole product. You use the filter to skip an expensive lookup that would have failed; the rare false-positive just means you pay for the lookup anyway. You never miss a real hit.

The three variants here build on each other. Bloom is the original — bits, hashes, beautifully small, but you can never delete. Counting Bloom turns each bit into a small counter so you can delete by decrementing, at the cost of more memory. Cuckoo is the modern alternative: stores short fingerprints in two candidate buckets and uses a 'cuckoo' eviction to make room — it deletes safely and uses less space than Bloom at the same false-positive rate.

Where this shows up

Database storage engines — Cassandra, HBase, LevelDB, RocksDB, ScyllaDB attach a Bloom filter to every SSTable so a key lookup can skip files that definitely don't hold it instead of opening every file on disk.
CDNs and caches — 'do I have this object?' before a multi-hop fetch. A 0.1%-false-positive filter the size of a thumbnail can rule out a million URLs.
Web browsers — Chrome's Safe Browsing and old Safari builds shipped a Bloom filter of malicious URLs locally; only the maybe-hits actually call the remote checker.
Crypto wallets & Bitcoin SPV — wallets register a Bloom filter of their addresses so the node sends back only matching transactions without learning the wallet's full set.
Networking & dedup — TCP/IP duplicate-packet detection, CDN URL deduplication, web-crawler 'already-visited' set, Kafka exactly-once log compaction. Anywhere a 'have I seen this?' check fronts an expensive op.
Privacy-preserving sketches — sending a filter is cheaper and leaks less than sending the raw set, which is why so many distributed systems use them at the protocol level.

Why give up exactness?

A hash set storing 1 billion 64-byte URLs needs ~60 GB. A Bloom filter with the same membership question, tuned for a 1% false-positive rate, needs ~1.2 GB — fifty times less. You trade rare extra lookups for fitting the whole thing in RAM. For most 'do I have this?' questions, that's the right trade.

Side-by-side

How they compare

The same concepts, on the same axes. Use this as a map; the individual pages are the territory.

Filter	Delete?	Bits per item (1% FPR)	False-positive shape	Use when
01Bloom Filter	No	~9.6 bits	`Grows with fill; can't shrink`	Append-only sets — crawl frontier, SSTable summaries, denylists.
02Counting Bloom	Yes (decrement)	~38 bits (4-bit counters)	`Same shape as Bloom`	Membership with churn — sessions, sliding-window dedup, cache admission.
03Cuckoo Filter	Yes (clean remove)	~7 bits (typical)	`Stable; insert can fail when very full`	Modern default — smaller than Bloom, supports delete, predictable lookups.

01Bloom Filter

Delete?: No
Bits per item (1% FPR): ~9.6 bits
False-positive shape: Grows with fill; can't shrink
Use when: Append-only sets — crawl frontier, SSTable summaries, denylists.

02Counting Bloom

Delete?: Yes (decrement)
Bits per item (1% FPR): ~38 bits (4-bit counters)
False-positive shape: Same shape as Bloom
Use when: Membership with churn — sessions, sliding-window dedup, cache admission.

03Cuckoo Filter

Delete?: Yes (clean remove)
Bits per item (1% FPR): ~7 bits (typical)
False-positive shape: Stable; insert can fail when very full
Use when: Modern default — smaller than Bloom, supports delete, predictable lookups.

Decision guide

Which one should you use?

A practical tour of when each algorithm wins.

How to pick

You only ever add (a crawl frontier, an immutable SSTable summary, a denylist that you rebuild instead of patching) → Bloom filter. Smallest and simplest; deletion isn't worth paying for if you don't need it.
*You need to add and remove, and you don't want to think about 'is the filter full?' edge cases → Counting Bloom*. Mature, easy to reason about, costs ~4× the memory.
*You need delete and the smallest filter at a given false-positive rate, and you can handle the rare insertion failure → Cuckoo filter*. Slightly more code, noticeably less RAM, faster lookups (only two cache lines touched).
You're picking a default in 2026 → Cuckoo. The Fan/Andersen/Kaminsky paper changed the landscape; for delete-supporting filters it's almost always the better engineering choice.

Always pair it with the source of truth

A filter never answers a query — it only lets you skip the work for the definite-misses. Behind every Bloom/cuckoo filter in production there is a real store (a hash table, a B-tree, an SSTable on disk) that confirms the maybe-hits. Designing without that backstop is how you turn a 1% false-positive rate into a 1% wrong-result rate, which is a bug.

Concepts in this track

3 concepts, in order

Each links to a concept page with its own explanation, prototype, and quiz.

Bloom Filter

A bit array plus a few hash functions: definitely-not or maybe-yes membership, in a fraction of the memory a hash set needs.

Beginner10 mintry it

Counting Bloom Filter

Each bit becomes a small counter so items can be removed by decrementing — at roughly 4× the memory of a vanilla Bloom filter.

Intermediate10 mintry it

Cuckoo Filter

Store short fingerprints in two candidate buckets and bump existing entries to their other home on collision. Delete-supporting and usually smaller than Bloom.

Advanced12 mintry it

Related tracks

If this one clicks, try these next.

Trees

The branching data structure under databases, filesystems, autocompletes, and priority queues. Nine trees, from the plain binary search tree through the self-balancing workhorses to the disk-friendly B-trees and the range-query structures competitive programmers swear by.

Intermediate9 concepts · 95 min

data-structuressearchfundamentals

Graph Theory

Before the algorithms, the shapes. Eight kinds of graph — directed and undirected, weighted and not, complete, bipartite, cyclic, and the all-important DAG — and how each property decides what you're allowed to do with the graph.

Beginner8 concepts · 70 min

graphsfundamentalsdata-structures

Memory Allocation

Before garbage collection ever runs, something has to hand out the memory. Six allocators — four ways to pick a hole, plus the two structured schemes real kernels actually ship.

Intermediate6 concepts · 70 min

memoryruntimeperformance