Intermediate

Rate Limiting

Control request throughput so a noisy client cannot starve everyone else. Compare the five canonical algorithms side-by-side.

distributedapithroughput

Start with Token Bucket Algorithm

What is Rate Limiting?

The 60-second primer

Rate limiting is the practice of capping how many requests a client can make to a system in a given time window. Every API at scale uses it. It's the difference between a service that politely tells abusive clients to slow down and one that falls over when traffic spikes.

The contract is simple: every incoming request is checked against a quota. If the client is under the limit, the request goes through. If not, it's rejected — typically with an HTTP 429 Too Many Requests response and a header telling the client when to try again.

The interesting part isn't the contract — it's how you decide whether the limit was hit. There are five canonical algorithms, and each one trades off a different property: bursts vs. smoothness, memory vs. precision, simplicity vs. boundary correctness. Pick the wrong one and you either invite abuse or alienate your real users.

Why every API at scale needs it

Protect against abuse — brute-force login attempts, credential stuffing, scrapers hammering endpoints.
Fairness across clients — one noisy customer shouldn't starve the rest. Rate limits enforce a per-client share.
Cost control — when each request costs you money (a downstream API call, an LLM token, a database query), an unbounded client is unbounded spend.
Protect downstream systems — your database has a connection limit. Your payment processor has a TPS cap. Rate limiting at the edge stops chaos from reaching them.
Predictable capacity planning — if every client is bounded, your total load is bounded.

Where it lives

Rate limiters almost always sit at the edge — API gateway, reverse proxy, or service mesh — keyed on something like API key, user ID, or IP address. Putting it deeper in the stack means the work is already happening; putting it at the edge means the bad request never costs you anything.

Side-by-side

How they compare

The same concepts, on the same axes. Use this as a map; the individual pages are the territory.

Algorithm	Bursts	Precision	Memory	Best for
01Token Bucket	Allowed (up to capacity)	Exact	`O(1) — 2 counters`	Public APIs with friendly bursts
02Leaky Bucket	Smoothed away	Exact	`O(B) — bounded queue`	Protecting slow downstreams
03Fixed Window Counter	Up to 2× at boundaries	Coarse	`O(1) — 1 counter`	Hourly/daily quotas, simplicity
04Sliding Window Counter	Smoothed	Approximate (~1% error)	`O(1) — 2 counters`	Edge networks at scale
05Sliding Window Log	Smoothed	Exact (timestamp-level)	`O(N) — every timestamp`	Auth, low-rate security limits

01Token Bucket

Bursts: Allowed (up to capacity)
Precision: Exact
Memory: O(1) — 2 counters
Best for: Public APIs with friendly bursts

02Leaky Bucket

Bursts: Smoothed away
Precision: Exact
Memory: O(B) — bounded queue
Best for: Protecting slow downstreams

03Fixed Window Counter

Bursts: Up to 2× at boundaries
Precision: Coarse
Memory: O(1) — 1 counter
Best for: Hourly/daily quotas, simplicity

04Sliding Window Counter

Bursts: Smoothed
Precision: Approximate (~1% error)
Memory: O(1) — 2 counters
Best for: Edge networks at scale

05Sliding Window Log

Bursts: Smoothed
Precision: Exact (timestamp-level)
Memory: O(N) — every timestamp
Best for: Auth, low-rate security limits

Decision guide

Which one should you use?

A practical tour of when each algorithm wins.

Decision guide

You want bursts but a sustained cap → Token Bucket. The default choice for almost any public API.
You must protect a slow downstream from any spike → Leaky Bucket. Output is constant by construction.
You need a simple per-hour or per-day quota → Fixed Window Counter. One Redis INCR per request.
You operate at edge scale and need smoothness without growing memory → Sliding Window Counter.
Precision matters more than throughput — login attempts, password resets, financial caps → Sliding Window Log.

When in doubt, start with Token Bucket

It's the most-deployed algorithm on the public internet for a reason. Easy to implement, easy to communicate ("X req/s with bursts up to Y"), and friendly to real user traffic.

Concepts in this track

5 concepts, in order

Each links to a concept page with its own explanation, prototype, and quiz.

Token Bucket Algorithm

A refilling bucket of tokens lets bursts through, but caps the sustained rate.

Intermediate8 mintry it

Leaky Bucket Algorithm

Requests drain at a constant rate, regardless of how fast they arrive.

Intermediate7 mintry it

Fixed Window Counter

Count requests per discrete time window. Simple, but suffers boundary spikes.

Beginner5 mintry it

Sliding Window Counter

Smooth out boundary effects by tracking a rolling, weighted view.

Intermediate8 mintry it

Sliding Window Log

Precise but memory-heavy: keep a log of every request timestamp.

Advanced9 mintry it

Related tracks

If this one clicks, try these next.

Load Balancing

Run more than one server and something has to decide which one handles each request. Nine algorithms, from a blind counter to capacity-and-load-aware routing — built up one signal at a time.

Intermediate9 concepts · 75 min

distributedscalingthroughput

Consistent Hashing

Map keys to servers so that adding or removing a server moves as few keys as possible. Five methods, from the classic hash ring to the table-based hashing inside modern network load balancers.

Intermediate5 concepts · 55 min

distributedscalingpatterns

Circuit Breaker

When a downstream service is failing, stop hammering it — fail fast instead. Six variants, from the state machine itself to the trip-condition tweaks that production resilience libraries actually ship.

Intermediate6 concepts · 70 min

distributedresiliencepatterns