Overview
What this concept solves
Count-Based is the simplest trip rule there is. Keep the results of the last N calls in a fixed-size buffer. Every time you add a new result, drop the oldest. Compute the failure rate over that buffer. If it crosses your threshold (say, 50%), trip the breaker.
That is the whole algorithm. No timestamps, no decay function, no latency tracking. Just a ring of booleans and a percentage. It is what most libraries (Resilience4j, Polly, Hystrix in 'count' mode) default to, and it is what you should start with unless you have a reason not to.
The fixed-size window is also its blind spot. If traffic is bursty — five calls in a minute, then nothing — the breaker may stay in whatever state it was the last time someone called it, ignoring fresh evidence simply because no new calls arrived. The Time-Based variant fixes that.
Mechanics
How it works
The trip rule, step by step
- Maintain a ring buffer of the last N call results (true = ok, false = fail). On every new call, push to the back and drop from the front if full.
- Compute the failure rate: count of fails ÷ count of entries. (Most implementations also require a minimum number of samples — typically half the window — before they will trip, so a single failure in a near-empty window doesn't immediately open you.)
- If the rate is at or above the threshold and you have enough samples, trip → OPEN.
- While OPEN, every call is rejected immediately for the configured cooldown.
- After the cooldown, OPEN → HALF-OPEN: a small number of trial calls are allowed. If the failure rate over the trials clears the threshold (e.g. all succeed), close. Otherwise, OPEN again.
Why a window — not a streak?
A counting variant ('5 fails in a row') is intuitive but brittle: a single intervening success resets the streak, even if you're really failing 80% of the time. A windowed rate captures the truth more honestly. Real failures don't ask permission to alternate with successes.
The bursty-traffic blind spot
Suppose your window is 20 and traffic is sparse — one call every few minutes. The breaker only learns when a call happens. A downstream that comes back to life between calls is invisible: the breaker still thinks the last 20 calls are 80% failed (because that's what they were ten minutes ago) and trips on the next request. If your traffic is bursty, prefer Time-Based, which ages calls out by clock time.
Interactive prototype
Run it. Break it. Tune it.
Sandboxed simulation embedded right in the page. No setup, no install.
About this simulation
A sliding window of the last N calls. Click healthy call or failing call to fill it; the bar tracks the failure rate, the dotted line marks the threshold. Cross it and the breaker trips to OPEN — every later call is rejected instantly until the cooldown elapses and Half-Open trial calls decide whether to close again. Flip on Auto traffic to watch a steady stream of requests do it on their own.
Hands-on
Try these on your own
Open the prototype above, run each experiment, predict the answer, then verify.
Watch the window fill and trip
Press healthy call four times, then failing call five times. Watch the bar climb from green to red and cross the threshold line — the breaker badge flips to open. Note the rejected count starts ticking only after it tripped.
The cooldown and recovery
After tripping, wait for the cooldown — the badge flips to half-open. Click healthy call the configured number of trial times (default 3). All trials pass → closed, window reset. Now make those trials fail and watch it slam back to open.
Auto traffic with a sick downstream
Tick Auto traffic and drag Downstream failures above the threshold. Watch it trip on its own. Now drag failures back to 0 and let the trial calls in Half-Open close it. This is what production traffic looks like through the breaker's eyes.
Push the knobs
Widen the Window size to 16 — it takes longer to trip but the rate is more accurate. Shrink it to 4 — it's twitchy and trips on small bursts. Raise the Failure threshold to 80% — the breaker becomes much more tolerant. Each knob is a tradeoff between sensitivity and stability.
In practice
When to use it — and what you give up
When to reach for Count-Based
- Steady, predictable traffic — a service handling dozens or hundreds of calls per second. The window always reflects roughly the last second or two of behaviour.
- You want the simplest possible breaker — fewer knobs to tune, easier to reason about, easier to debug. Count-Based is the 'no surprises' default.
- Memory is a real constraint — N booleans is the smallest serious breaker you can build. No timestamps, no counters per second.
- You're prototyping — ship Count-Based first, then upgrade to Time-Based or Error-Percentage once you see the actual traffic shape.
- Avoid when traffic is bursty or very low volume — both starve the window of fresh data. Use Time-Based or Error-Percentage instead.
Library defaults
Resilience4j's default sliding-window type is COUNT_BASED with N=100 and a 50% failure-rate threshold. Polly's basic circuit-breaker counts consecutive failures (a degenerate count-based with the simplest possible rule). Hystrix used a bucketed count window internally even when its public API talked about time.
Pros
- Tiny memory — N bits (or N booleans) and a pointer.
- O(1) per call — push to the ring, compute rate over a fixed-size array, done.
- Predictable — no clocks, no decay, no surprise from time-of-day effects.
- Easy to explain — 'we trip if the last 20 calls were ≥ 50% failures' is a sentence a non-engineer can understand.
Cons
- Insensitive to traffic shape — bursty or low-volume traffic can leave stale data in the window.
- No self-recovery in quiet periods — without new calls, the rate stays whatever it was last measured.
- Threshold tuning is workload-specific — what counts as 'too many' depends on your normal error rate, which N alone doesn't capture.
- Can trip on a tiny sample unless you enforce a minimum-call count — one fail of two calls is 50%.
Reference
Code & further reading
A minimal reference implementation and pointers worth bookmarking.
// Count-based circuit breaker: trip when the failure rate over the last N calls
// crosses the threshold. The simplest sliding window there is.
class CountBasedBreaker {
private state: "CLOSED" | "OPEN" | "HALF" = "CLOSED";
private window: boolean[] = []; // true = ok, false = fail
private trials: boolean[] = [];
private openUntil = 0;
constructor(
private size = 10,
private threshold = 0.5, // 50%
private cooldownMs = 5000,
private trialCount = 3,
private minCalls = 5, // don't trip on tiny samples
) {}
async call<T>(work: () => Promise<T>): Promise<T> {
if (this.state === "OPEN") {
if (Date.now() >= this.openUntil) { this.state = "HALF"; this.trials = []; }
else throw new Error("circuit open");
}
try {
const result = await work();
this.record(true);
return result;
} catch (err) {
this.record(false);
throw err;
}
}
private record(ok: boolean) {
if (this.state === "HALF") {
this.trials.push(ok);
if (this.trials.length >= this.trialCount) {
const fails = this.trials.filter(x => !x).length;
if (fails / this.trials.length >= this.threshold) this.trip();
else this.close();
}
return;
}
this.window.push(ok);
if (this.window.length > this.size) this.window.shift();
if (this.window.length >= this.minCalls) {
const fails = this.window.filter(x => !x).length;
if (fails / this.window.length >= this.threshold) this.trip();
}
}
private trip() { this.state = "OPEN"; this.openUntil = Date.now() + this.cooldownMs; this.trials = []; }
private close() { this.state = "CLOSED"; this.window = []; this.trials = []; }
}References & further reading
5 sources- Docsresilience4j.readme.io
Resilience4j — CircuitBreaker sliding window
The canonical modern implementation. The COUNT_BASED type is exactly this concept; the docs explain how it composes with the half-open phase.
- Docslearn.microsoft.com
Microsoft .NET — Circuit Breaker pattern
Azure architecture-centre write-up with sequence diagrams for the trip and half-open transitions. Good complement to the algorithm above.
- Docspollydocs.org
Polly — Advanced circuit-breaker (count-based)
.NET resilience library. The 'advanced' variant counts failures and successes in a sampling window — same idea, .NET vocabulary.
- Docsgithub.com
Hystrix — How it works
Netflix's original (archived) implementation. Worth reading for the bucketed-window optimisation that turns 'last 10 seconds' into ten 1-second counters — useful background for the next concept.
- Articlemartinfowler.com
Martin Fowler — Circuit Breaker
The short article with the canonical state-diagram drawing. Reads as a count-based rule by default.
Knowledge check
Did the prototype land?
Quick questions, answers revealed on submit. No scoring saved.
question 01 / 03
What exactly is the 'window' in count-based?
question 02 / 03
Why do most implementations require a minimum number of calls before they will trip?
question 03 / 03
If your service handles roughly 1 request per minute, why is count-based a poor choice?
0/3 answered
Was this concept helpful?
Tell us what worked, or what to improve. We read every note.