Time-Based — Circuit Breaker

Overview

What this concept solves

Time-Based is Count-Based's twin, with one critical change: the window is sized in seconds, not call count. Every call is stamped with its arrival time. The breaker only looks at calls newer than now - windowSeconds. Old calls don't have to wait for new ones to push them out — they age out by themselves.

Implementation is almost always bucketed: instead of storing every call's timestamp, the breaker maintains an array of one-second counters (or a similar fine grain) and rotates them. Hystrix did this with ten 1-second buckets; Resilience4j's TIME_BASED window does the same. The bucketing keeps the bookkeeping O(1) per call regardless of traffic volume.

The payoff is self-recovery during quiet periods. If your service was failing badly but stops getting traffic for a few seconds, those failed calls quietly drop out of the window and the breaker un-trips itself — no probing needed. The price is slightly more bookkeeping per call.

Mechanics

How it works

The trip rule

Maintain a window of the last W seconds of call results — typically as W one-second buckets, each holding (ok count, fail count).
On every call, increment the bucket for floor(now / 1s) and let older buckets age out as time advances.
Compute the failure rate across all buckets currently inside the window.
If the rate is at or above the threshold and there are enough samples, trip.
OPEN and HALF-OPEN behave exactly like Count-Based: cooldown, then a small number of trial calls decide.

Why bucketed and not exact timestamps?

Storing every timestamp is O(traffic) memory. Bucketing trades a tiny precision loss for fixed memory: with W=10 seconds at 1-second granularity, you keep exactly 10 counter pairs no matter whether you handle 1 or 100 000 requests per second. The granularity sets your worst-case error: a 1s bucket can be off by ±1 second around the trip moment, which almost never matters.

Bursty traffic — fixed

Imagine a service that handles 100 requests in one second, then nothing for a minute, then another 100. A Count-Based window of 20 calls would still be 'looking at' that first burst a minute later — its view of 'now' is stale. Time-Based with a 10-second window would have aged the burst out completely. That is the bug this variant exists to fix.

Interactive prototype

Run it. Break it. Tune it.

Sandboxed simulation embedded right in the page. No setup, no install.

simulation › Time-Based

About this simulation

A time-based window — failures aren't counted by call number, but by clock. Each second has a bar; the current second is on the right. Spike the failure rate to trip it, then turn the failures off and stop clicking — the red bars march left and fall off the edge, and the breaker recovers on its own without any new calls.

Hands-on

Try these on your own

Open the prototype above, run each experiment, predict the answer, then verify.

try 01

Watch the bars age out

Click failing call five times in quick succession — all five go into the rightmost (current-second) bar. Now don't click anything for ten seconds. Watch the red bars march left as new (empty) seconds slide in, and finally fall off the left edge. The failure rate drifts back to 0 without you touching the controls.

try 02

Spike, trip, and self-recover

Tick Auto traffic and drag Downstream failures to 80%. The breaker trips. Now drag failures to 0 and untick auto traffic. Don't click anything. Within a few seconds the red bars age out, the rate falls below the threshold — but the breaker doesn't auto-close in OPEN. It waits for its cooldown, opens to HALF-OPEN, and the next clicks decide.

try 03

Compare to Count-Based

Open the Count-Based prototype in another tab. In both, click failing call five times and stop. In Count-Based, those 5 fails will still be in the window for the next 5 calls you make — possibly minutes later. In Time-Based, they're gone in 10 seconds whether or not you make more calls. That is the entire difference between the two algorithms.

try 04

Widen the window

Set Window length to 16s. The breaker now needs sustained failure across 16 seconds to trip — a 1-second blip won't do it. Drop the window to 4s and the breaker becomes twitchy: a single bad second can dominate. The window length is your noise filter.

In practice

When to use it — and what you give up

When to prefer Time-Based

Bursty or sparse traffic — cron jobs, batch syncs, low-RPS endpoints. Counting calls leaves stale data; counting seconds is honest about how old the evidence is.
You want quiet-period self-recovery — if requests stop entirely, the breaker should un-trip itself without you scripting probes.
Mixed traffic patterns — your service handles big bursts and quiet patches in the same hour. A time-windowed view is the same shape whether you got 5 or 5 000 calls in the last second.
Capacity planning matters — 'last 10 seconds' is a duration you can put on a dashboard alongside latency. 'Last 100 calls' isn't.

Watch the bucket granularity

With 1-second buckets, the window is technically anywhere from W-1 to W seconds wide depending on where you are in the current second. That's normally fine. If you need sub-second precision (latency-sensitive edge proxies), drop to 100ms buckets — at the cost of 10× the memory.

Pros

Self-recovers in quiet periods — failed calls age out by wall-clock, no probes required.
Honest under bursty traffic — a sudden burst doesn't anchor the window forever.
Operationally legible — 'we trip if 50% of the last 10 seconds were failures' is a dashboard sentence.
Still O(1) per call — bucketing keeps memory and CPU fixed regardless of throughput.

Cons

Slightly more bookkeeping than Count-Based — you have to manage buckets and time advancement.
Bucket granularity is a tuning knob — too coarse and you lose precision; too fine and you spend more memory.
Clock skew can lie to you — if the OS clock jumps (NTP correction, VM pause), buckets can be wrong for a moment. Most libraries use a monotonic clock to dodge this.
Still vulnerable at very low volumes — pair with a minimum-call gate (the Error-Percentage variant) if traffic is tiny.

Reference

Code & further reading

A minimal reference implementation and pointers worth bookmarking.

time_based.go

// Time-based circuit breaker: trip on the failure rate over the last W seconds.
// Implemented with one bucket per second — bucketing keeps everything O(1).
package breaker

import (
	"errors"
	"time"
)

var ErrOpen = errors.New("circuit open")

type bucket struct {
	ok, fail int
	sec      int64
}

type TimeBasedBreaker struct {
	state     string // "CLOSED" | "OPEN" | "HALF"
	buckets   []bucket
	trials    []bool
	openUntil time.Time

	windowSec  int64
	threshold  float64
	cooldown   time.Duration
	trialCount int
	minCalls   int
}

func NewTimeBasedBreaker() *TimeBasedBreaker {
	return &TimeBasedBreaker{
		state:      "CLOSED",
		windowSec:  10,
		threshold:  0.5,
		cooldown:   5000 * time.Millisecond,
		trialCount: 3,
		minCalls:   5,
	}
}

func (cb *TimeBasedBreaker) Call(work func() error) error {
	if cb.state == "OPEN" && !time.Now().Before(cb.openUntil) {
		cb.state = "HALF"
		cb.trials = nil
	}
	if cb.state == "OPEN" {
		return ErrOpen
	}
	if err := work(); err != nil {
		cb.record(false)
		return err
	}
	cb.record(true)
	return nil
}

func (cb *TimeBasedBreaker) record(ok bool) {
	if cb.state == "HALF" {
		cb.trials = append(cb.trials, ok)
		if len(cb.trials) >= cb.trialCount {
			fails := 0
			for _, t := range cb.trials {
				if !t {
					fails++
				}
			}
			if float64(fails)/float64(len(cb.trials)) >= cb.threshold {
				cb.trip()
			} else {
				cb.close()
			}
		}
		return
	}
	sec := time.Now().Unix()
	if len(cb.buckets) == 0 || cb.buckets[len(cb.buckets)-1].sec != sec {
		cb.buckets = append(cb.buckets, bucket{sec: sec})
	}
	b := &cb.buckets[len(cb.buckets)-1]
	if ok {
		b.ok++
	} else {
		b.fail++
	}
	cb.prune(sec)
	cb.maybeTrip()
}

func (cb *TimeBasedBreaker) prune(now int64) {
	lo := now - cb.windowSec + 1
	for len(cb.buckets) > 0 && cb.buckets[0].sec < lo {
		cb.buckets = cb.buckets[1:]
	}
}

func (cb *TimeBasedBreaker) maybeTrip() {
	ok, fail := 0, 0
	for _, b := range cb.buckets {
		ok += b.ok
		fail += b.fail
	}
	total := ok + fail
	if total >= cb.minCalls && float64(fail)/float64(total) >= cb.threshold {
		cb.trip()
	}
}

func (cb *TimeBasedBreaker) trip() {
	cb.state = "OPEN"
	cb.openUntil = time.Now().Add(cb.cooldown)
	cb.trials = nil
}

func (cb *TimeBasedBreaker) close() {
	cb.state = "CLOSED"
	cb.buckets = nil
	cb.trials = nil
}

References & further reading

5 sources

Knowledge check

Did the prototype land?

Quick questions, answers revealed on submit. Sign in to save your best score.

question 01 / 03

What's the key difference between Time-Based and Count-Based?

question 02 / 03

Why are most implementations bucketed instead of storing per-call timestamps?

question 03 / 03

Which workload makes Time-Based clearly better than Count-Based?

0/3 answered