Rate Limiting
Control request throughput so a noisy client cannot starve everyone else. Compare the five canonical algorithms side-by-side.
What is Rate Limiting?
The 60-second primer
Rate limiting is the practice of capping how many requests a client can make to a system in a given time window. Every API at scale uses it. It's the difference between a service that politely tells abusive clients to slow down and one that falls over when traffic spikes.
The contract is simple: every incoming request is checked against a quota. If the client is under the limit, the request goes through. If not, it's rejected — typically with an HTTP 429 Too Many Requests response and a header telling the client when to try again.
The interesting part isn't the contract — it's how you decide whether the limit was hit. There are five canonical algorithms, and each one trades off a different property: bursts vs. smoothness, memory vs. precision, simplicity vs. boundary correctness. Pick the wrong one and you either invite abuse or alienate your real users.
Why every API at scale needs it
- Protect against abuse — brute-force login attempts, credential stuffing, scrapers hammering endpoints.
- Fairness across clients — one noisy customer shouldn't starve the rest. Rate limits enforce a per-client share.
- Cost control — when each request costs you money (a downstream API call, an LLM token, a database query), an unbounded client is unbounded spend.
- Protect downstream systems — your database has a connection limit. Your payment processor has a TPS cap. Rate limiting at the edge stops chaos from reaching them.
- Predictable capacity planning — if every client is bounded, your total load is bounded.
Where it lives
Rate limiters almost always sit at the edge — API gateway, reverse proxy, or service mesh — keyed on something like API key, user ID, or IP address. Putting it deeper in the stack means the work is already happening; putting it at the edge means the bad request never costs you anything.
Side-by-side
How they compare
The same concepts, on the same axes. Use this as a map; the individual pages are the territory.
| Algorithm | Bursts | Precision | Memory | Best for |
|---|---|---|---|---|
01Token Bucket | Allowed (up to capacity) | Exact | O(1) — 2 counters | Public APIs with friendly bursts |
02Leaky Bucket | Smoothed away | Exact | O(B) — bounded queue | Protecting slow downstreams |
03Fixed Window Counter | Up to 2× at boundaries | Coarse | O(1) — 1 counter | Hourly/daily quotas, simplicity |
04Sliding Window Counter | Smoothed | Approximate (~1% error) | O(1) — 2 counters | Edge networks at scale |
05Sliding Window Log | Smoothed | Exact (timestamp-level) | O(N) — every timestamp | Auth, low-rate security limits |
- Bursts
- Allowed (up to capacity)
- Precision
- Exact
- Memory
O(1) — 2 counters- Best for
- Public APIs with friendly bursts
- Bursts
- Smoothed away
- Precision
- Exact
- Memory
O(B) — bounded queue- Best for
- Protecting slow downstreams
- Bursts
- Up to 2× at boundaries
- Precision
- Coarse
- Memory
O(1) — 1 counter- Best for
- Hourly/daily quotas, simplicity
- Bursts
- Smoothed
- Precision
- Approximate (~1% error)
- Memory
O(1) — 2 counters- Best for
- Edge networks at scale
- Bursts
- Smoothed
- Precision
- Exact (timestamp-level)
- Memory
O(N) — every timestamp- Best for
- Auth, low-rate security limits
Decision guide
Which one should you use?
A practical tour of when each algorithm wins.
Decision guide
- You want bursts but a sustained cap → Token Bucket. The default choice for almost any public API.
- You must protect a slow downstream from any spike → Leaky Bucket. Output is constant by construction.
- You need a simple per-hour or per-day quota → Fixed Window Counter. One Redis
INCRper request. - You operate at edge scale and need smoothness without growing memory → Sliding Window Counter.
- Precision matters more than throughput — login attempts, password resets, financial caps → Sliding Window Log.
When in doubt, start with Token Bucket
It's the most-deployed algorithm on the public internet for a reason. Easy to implement, easy to communicate ("X req/s with bursts up to Y"), and friendly to real user traffic.
Concepts in this track
5 concepts, in order
Each links to a concept page with its own explanation, prototype, and quiz.
Token Bucket Algorithm
A refilling bucket of tokens lets bursts through, but caps the sustained rate.
Leaky Bucket Algorithm
Requests drain at a constant rate, regardless of how fast they arrive.
Fixed Window Counter
Count requests per discrete time window. Simple, but suffers boundary spikes.
Sliding Window Counter
Smooth out boundary effects by tracking a rolling, weighted view.
Sliding Window Log
Precise but memory-heavy: keep a log of every request timestamp.
Related tracks
If this one clicks, try these next.
Cache Write Policies
Three ways to handle a write when you have a cache in front of the store. Each policy is a different bet about durability, throughput, and how stale your data is allowed to get.
Cache Eviction
When the cache fills up, something has to go — and which one you pick decides your hit rate. Ten classic policies, side-by-side.
Garbage Collection
How a runtime reclaims memory you stopped using — without you ever calling free(). Eight algorithms, from the counter on every object to the collectors that run alongside your program.