Overview
What this concept solves
Time-Based is Count-Based's twin, with one critical change: the window is sized in seconds, not call count. Every call is stamped with its arrival time. The breaker only looks at calls newer than now - windowSeconds. Old calls don't have to wait for new ones to push them out — they age out by themselves.
Implementation is almost always bucketed: instead of storing every call's timestamp, the breaker maintains an array of one-second counters (or a similar fine grain) and rotates them. Hystrix did this with ten 1-second buckets; Resilience4j's TIME_BASED window does the same. The bucketing keeps the bookkeeping O(1) per call regardless of traffic volume.
The payoff is self-recovery during quiet periods. If your service was failing badly but stops getting traffic for a few seconds, those failed calls quietly drop out of the window and the breaker un-trips itself — no probing needed. The price is slightly more bookkeeping per call.
Mechanics
How it works
The trip rule
- Maintain a window of the last W seconds of call results — typically as W one-second buckets, each holding (ok count, fail count).
- On every call, increment the bucket for
floor(now / 1s)and let older buckets age out as time advances. - Compute the failure rate across all buckets currently inside the window.
- If the rate is at or above the threshold and there are enough samples, trip.
- OPEN and HALF-OPEN behave exactly like Count-Based: cooldown, then a small number of trial calls decide.
Why bucketed and not exact timestamps?
Storing every timestamp is O(traffic) memory. Bucketing trades a tiny precision loss for fixed memory: with W=10 seconds at 1-second granularity, you keep exactly 10 counter pairs no matter whether you handle 1 or 100 000 requests per second. The granularity sets your worst-case error: a 1s bucket can be off by ±1 second around the trip moment, which almost never matters.
Bursty traffic — fixed
Imagine a service that handles 100 requests in one second, then nothing for a minute, then another 100. A Count-Based window of 20 calls would still be 'looking at' that first burst a minute later — its view of 'now' is stale. Time-Based with a 10-second window would have aged the burst out completely. That is the bug this variant exists to fix.
Interactive prototype
Run it. Break it. Tune it.
Sandboxed simulation embedded right in the page. No setup, no install.
About this simulation
A time-based window — failures aren't counted by call number, but by clock. Each second has a bar; the current second is on the right. Spike the failure rate to trip it, then turn the failures off and stop clicking — the red bars march left and fall off the edge, and the breaker recovers on its own without any new calls.
Hands-on
Try these on your own
Open the prototype above, run each experiment, predict the answer, then verify.
Watch the bars age out
Click failing call five times in quick succession — all five go into the rightmost (current-second) bar. Now don't click anything for ten seconds. Watch the red bars march left as new (empty) seconds slide in, and finally fall off the left edge. The failure rate drifts back to 0 without you touching the controls.
Spike, trip, and self-recover
Tick Auto traffic and drag Downstream failures to 80%. The breaker trips. Now drag failures to 0 and untick auto traffic. Don't click anything. Within a few seconds the red bars age out, the rate falls below the threshold — but the breaker doesn't auto-close in OPEN. It waits for its cooldown, opens to HALF-OPEN, and the next clicks decide.
Compare to Count-Based
Open the Count-Based prototype in another tab. In both, click failing call five times and stop. In Count-Based, those 5 fails will still be in the window for the next 5 calls you make — possibly minutes later. In Time-Based, they're gone in 10 seconds whether or not you make more calls. That is the entire difference between the two algorithms.
Widen the window
Set Window length to 16s. The breaker now needs sustained failure across 16 seconds to trip — a 1-second blip won't do it. Drop the window to 4s and the breaker becomes twitchy: a single bad second can dominate. The window length is your noise filter.
In practice
When to use it — and what you give up
When to prefer Time-Based
- Bursty or sparse traffic — cron jobs, batch syncs, low-RPS endpoints. Counting calls leaves stale data; counting seconds is honest about how old the evidence is.
- You want quiet-period self-recovery — if requests stop entirely, the breaker should un-trip itself without you scripting probes.
- Mixed traffic patterns — your service handles big bursts and quiet patches in the same hour. A time-windowed view is the same shape whether you got 5 or 5 000 calls in the last second.
- Capacity planning matters — 'last 10 seconds' is a duration you can put on a dashboard alongside latency. 'Last 100 calls' isn't.
Watch the bucket granularity
With 1-second buckets, the window is technically anywhere from W-1 to W seconds wide depending on where you are in the current second. That's normally fine. If you need sub-second precision (latency-sensitive edge proxies), drop to 100ms buckets — at the cost of 10× the memory.
Pros
- Self-recovers in quiet periods — failed calls age out by wall-clock, no probes required.
- Honest under bursty traffic — a sudden burst doesn't anchor the window forever.
- Operationally legible — 'we trip if 50% of the last 10 seconds were failures' is a dashboard sentence.
- Still O(1) per call — bucketing keeps memory and CPU fixed regardless of throughput.
Cons
- Slightly more bookkeeping than Count-Based — you have to manage buckets and time advancement.
- Bucket granularity is a tuning knob — too coarse and you lose precision; too fine and you spend more memory.
- Clock skew can lie to you — if the OS clock jumps (NTP correction, VM pause), buckets can be wrong for a moment. Most libraries use a monotonic clock to dodge this.
- Still vulnerable at very low volumes — pair with a minimum-call gate (the Error-Percentage variant) if traffic is tiny.
Reference
Code & further reading
A minimal reference implementation and pointers worth bookmarking.
// Time-based circuit breaker: trip on the failure rate over the last W seconds.
// Implemented with one bucket per second — bucketing keeps everything O(1).
class TimeBasedBreaker {
private state: "CLOSED" | "OPEN" | "HALF" = "CLOSED";
private buckets: Array<{ ok: number; fail: number; sec: number }> = [];
private trials: boolean[] = [];
private openUntil = 0;
constructor(
private windowSec = 10,
private threshold = 0.5,
private cooldownMs = 5000,
private trialCount = 3,
private minCalls = 5,
) {}
async call<T>(work: () => Promise<T>): Promise<T> {
if (this.state === "OPEN" && Date.now() >= this.openUntil) {
this.state = "HALF"; this.trials = [];
}
if (this.state === "OPEN") throw new Error("circuit open");
try {
const result = await work();
this.record(true);
return result;
} catch (err) {
this.record(false);
throw err;
}
}
private record(ok: boolean) {
if (this.state === "HALF") {
this.trials.push(ok);
if (this.trials.length >= this.trialCount) {
const fails = this.trials.filter(x => !x).length;
if (fails / this.trials.length >= this.threshold) this.trip();
else this.close();
}
return;
}
const sec = Math.floor(Date.now() / 1000);
let b = this.buckets[this.buckets.length - 1];
if (!b || b.sec !== sec) {
b = { ok: 0, fail: 0, sec };
this.buckets.push(b);
}
if (ok) b.ok++; else b.fail++;
this.prune(sec);
this.maybeTrip();
}
private prune(now: number) {
const lo = now - this.windowSec + 1;
while (this.buckets.length && this.buckets[0].sec < lo) this.buckets.shift();
}
private maybeTrip() {
let ok = 0, fail = 0;
for (const b of this.buckets) { ok += b.ok; fail += b.fail; }
const total = ok + fail;
if (total >= this.minCalls && fail / total >= this.threshold) this.trip();
}
private trip() { this.state = "OPEN"; this.openUntil = Date.now() + this.cooldownMs; this.trials = []; }
private close() { this.state = "CLOSED"; this.buckets = []; this.trials = []; }
}References & further reading
5 sources- Docsresilience4j.readme.io
Resilience4j — TIME_BASED sliding window
The reference implementation. The TIME_BASED type is exactly this — bucketed per-second with the same trip rule and half-open phase.
- Docsgithub.com
Hystrix — Metrics and Health Counts
Netflix's original. Walks through the 10-bucket × 1-second rolling-window structure that almost every modern library inherited.
- Talkyoutube.com
Adrian Cockcroft — Lessons from Netflix
Why Netflix needed time-based windows for their bursty cross-region traffic. The same talk that popularised the circuit-breaker-as-default-pattern idea.
- Docsenvoyproxy.io
Envoy — outlier detection
Envoy's per-host ejection uses a time-windowed view of consecutive 5xx and gateway failures — the same time-based-windowing idea at the proxy layer.
- Booksamnewman.io
Sam Newman — Building Microservices, Ch. 12
The chapter on resilience covers why time-windowing matters in service meshes and how it composes with retries and timeouts.
Knowledge check
Did the prototype land?
Quick questions, answers revealed on submit. No scoring saved.
question 01 / 03
What's the key difference between Time-Based and Count-Based?
question 02 / 03
Why are most implementations bucketed instead of storing per-call timestamps?
question 03 / 03
Which workload makes Time-Based clearly better than Count-Based?
0/3 answered
Was this concept helpful?
Tell us what worked, or what to improve. We read every note.