Intermediate8 min readlive prototype

Token Bucket Algorithm

A refilling bucket of tokens lets bursts through, but caps the sustained rate.

Overview

What this concept solves

The token bucket is the most widely deployed rate-limiting algorithm on the public internet. It powers the request quotas behind APIs at Stripe, AWS, GitHub, Cloudflare, and most service meshes. The reason: it lets you allow short bursts while still enforcing a strict long-term rate — and it costs almost nothing to implement.

The mental model is exactly what the name suggests. A bucket holds tokens. A faucet drips new tokens in at a constant rate. Every incoming request tries to grab one token. If the bucket has one, the request goes through. If it's empty, the request is rejected (or queued).

Mechanics

How it works

Two numbers, that's it

A token bucket is described by two parameters:

  • Capacity (B) — the maximum number of tokens the bucket can hold. This is your burst budget.
  • Refill rate (r) — how many tokens are added per second. This is your sustained throughput.

Each arriving request costs one token (or sometimes more — for example, a heavy endpoint might cost 5 tokens). The algorithm is:

  1. On each request, first refill: tokens = min(B, tokens + elapsed × r).
  2. If tokens ≥ cost, subtract cost and allow the request.
  3. Otherwise, reject the request (or queue it, depending on the policy).

Why the lazy refill?

Notice the refill is computed on-demand using the elapsed time since the last update — there's no background timer ticking tokens in. This is the trick that makes token bucket O(1) memory and trivially distributed: you only need to store tokens and lastRefill per client.

Bursts vs. sustained rate

The bucket capacity B controls how big a burst can be. The refill rate r controls the sustained rate. These two are independent — you can pick generous bursts with a tight long-run rate, or vice versa.

Interactive prototype

Run it. Break it. Tune it.

Sandboxed simulation embedded right in the page. No setup, no install.

About this simulation

A bucket of size 10 refills at one token per second. Each request consumes a token. Hit 'Burst of 10' to drain it instantly — then watch the bucket recover at the steady refill rate.

Hands-on

Try these on your own

Open the prototype above, run each experiment, predict the answer, then verify.

try 01

Drain it on purpose

Click 'Burst of 10' once. Watch the bucket empty and the next clicks get rejected. Time how long until you can send one again — that's the refill rate doing its job.

try 02

The 'burst then idle' trick

Send a burst, wait 10 seconds without clicking, then burst again. The bucket refilled fully during your idle period, so you get a fresh burst budget — even though your one-minute total is well above the refill rate.

try 03

Predict the steady state

If the bucket starts full and you click 'Send request' once per second forever, what does the token count converge to? Try it for 30 seconds and confirm.

In practice

When to use it — and what you give up

When to reach for it

  • Public APIs where occasional bursts are user-friendly but sustained abuse must be capped.
  • Per-user or per-API-key quotas at the edge — gateway, reverse proxy, or service mesh.
  • Anywhere you want a simple two-number contract: "X requests/sec, with bursts up to Y".

Real-world example

Stripe's API limits are token-bucket: 100 requests/sec sustained, with the ability to burst to ~25 in a short window. AWS API Gateway uses the same model under the hood.

Pros

  • Allows short bursts — feels good for real user traffic that is bursty by nature.
  • Two counters per client — trivial in memory and easy to distribute via Redis or a counter store.
  • Lazy O(1) refill with no background scheduler.
  • Easy to tune: capacity = burst budget, refill = sustained rate.

Cons

  • Brief over-rate windows are intentional — if downstream cannot handle a burst, this is the wrong tool.
  • In distributed setups, contention on the shared counter requires careful design (Redis Lua scripts, sharding, or sloppy approximations).
  • Heterogeneous request costs need careful pricing or one expensive call can drain the bucket.

Reference

Code & further reading

A minimal reference implementation and pointers worth bookmarking.

token-bucket.ts
// A minimal token bucket. Lazy refill — no timer needed.
class TokenBucket {
  constructor(
    private capacity: number,        // max burst budget
    private refillPerSecond: number, // sustained rate
    private tokens = capacity,
    private lastRefill = Date.now(),
  ) {}

  tryConsume(cost = 1): boolean {
    this.refill();
    if (this.tokens < cost) return false;
    this.tokens -= cost;
    return true;
  }

  private refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(
      this.capacity,
      this.tokens + elapsed * this.refillPerSecond,
    );
    this.lastRefill = now;
  }
}

// Usage: 100 req/s sustained, bursts up to 25
const bucket = new TokenBucket(25, 100);
if (bucket.tryConsume()) {
  // ... handle the request
} else {
  // ... return 429 Too Many Requests
}

References & further reading

4 sources

Knowledge check

Did the prototype land?

Quick questions, answers revealed on submit. No scoring saved.

question 01 / 03

Which parameter of a token bucket controls how big a sudden burst can be?

question 02 / 03

Why is refill computed on each request instead of by a background timer?

question 03 / 03

A token bucket has capacity 100 and refill rate 10/sec. What is the maximum number of requests that can be served in 1 minute, starting from a full bucket?

0/3 answered

Was this concept helpful?

Tell us what worked, or what to improve. We read every note.