Count-Based — Circuit Breaker

Overview

What this concept solves

Count-Based is the simplest trip rule there is. Keep the results of the last N calls in a fixed-size buffer. Every time you add a new result, drop the oldest. Compute the failure rate over that buffer. If it crosses your threshold (say, 50%), trip the breaker.

That is the whole algorithm. No timestamps, no decay function, no latency tracking. Just a ring of booleans and a percentage. It is what most libraries (Resilience4j, Polly, Hystrix in 'count' mode) default to, and it is what you should start with unless you have a reason not to.

The fixed-size window is also its blind spot. If traffic is bursty — five calls in a minute, then nothing — the breaker may stay in whatever state it was the last time someone called it, ignoring fresh evidence simply because no new calls arrived. The Time-Based variant fixes that.

Mechanics

How it works

The trip rule, step by step

Maintain a ring buffer of the last N call results (true = ok, false = fail). On every new call, push to the back and drop from the front if full.
Compute the failure rate: count of fails ÷ count of entries. (Most implementations also require a minimum number of samples — typically half the window — before they will trip, so a single failure in a near-empty window doesn't immediately open you.)
If the rate is at or above the threshold and you have enough samples, trip → OPEN.
While OPEN, every call is rejected immediately for the configured cooldown.
After the cooldown, OPEN → HALF-OPEN: a small number of trial calls are allowed. If the failure rate over the trials clears the threshold (e.g. all succeed), close. Otherwise, OPEN again.

Why a window — not a streak?

A counting variant ('5 fails in a row') is intuitive but brittle: a single intervening success resets the streak, even if you're really failing 80% of the time. A windowed rate captures the truth more honestly. Real failures don't ask permission to alternate with successes.

The bursty-traffic blind spot

Suppose your window is 20 and traffic is sparse — one call every few minutes. The breaker only learns when a call happens. A downstream that comes back to life between calls is invisible: the breaker still thinks the last 20 calls are 80% failed (because that's what they were ten minutes ago) and trips on the next request. If your traffic is bursty, prefer Time-Based, which ages calls out by clock time.

Interactive prototype

Run it. Break it. Tune it.

Sandboxed simulation embedded right in the page. No setup, no install.

simulation › Count-Based

About this simulation

A sliding window of the last N calls. Click healthy call or failing call to fill it; the bar tracks the failure rate, the dotted line marks the threshold. Cross it and the breaker trips to OPEN — every later call is rejected instantly until the cooldown elapses and Half-Open trial calls decide whether to close again. Flip on Auto traffic to watch a steady stream of requests do it on their own.

Hands-on

Try these on your own

Open the prototype above, run each experiment, predict the answer, then verify.

try 01

Watch the window fill and trip

Press healthy call four times, then failing call five times. Watch the bar climb from green to red and cross the threshold line — the breaker badge flips to open. Note the rejected count starts ticking only after it tripped.

try 02

The cooldown and recovery

After tripping, wait for the cooldown — the badge flips to half-open. Click healthy call the configured number of trial times (default 3). All trials pass → closed, window reset. Now make those trials fail and watch it slam back to open.

try 03

Auto traffic with a sick downstream

Tick Auto traffic and drag Downstream failures above the threshold. Watch it trip on its own. Now drag failures back to 0 and let the trial calls in Half-Open close it. This is what production traffic looks like through the breaker's eyes.

try 04

Push the knobs

Widen the Window size to 16 — it takes longer to trip but the rate is more accurate. Shrink it to 4 — it's twitchy and trips on small bursts. Raise the Failure threshold to 80% — the breaker becomes much more tolerant. Each knob is a tradeoff between sensitivity and stability.

In practice

When to use it — and what you give up

When to reach for Count-Based

Steady, predictable traffic — a service handling dozens or hundreds of calls per second. The window always reflects roughly the last second or two of behaviour.
You want the simplest possible breaker — fewer knobs to tune, easier to reason about, easier to debug. Count-Based is the 'no surprises' default.
Memory is a real constraint — N booleans is the smallest serious breaker you can build. No timestamps, no counters per second.
You're prototyping — ship Count-Based first, then upgrade to Time-Based or Error-Percentage once you see the actual traffic shape.
Avoid when traffic is bursty or very low volume — both starve the window of fresh data. Use Time-Based or Error-Percentage instead.

Library defaults

Resilience4j's default sliding-window type is COUNT_BASED with N=100 and a 50% failure-rate threshold. Polly's basic circuit-breaker counts consecutive failures (a degenerate count-based with the simplest possible rule). Hystrix used a bucketed count window internally even when its public API talked about time.

Pros

Tiny memory — N bits (or N booleans) and a pointer.
O(1) per call — push to the ring, compute rate over a fixed-size array, done.
Predictable — no clocks, no decay, no surprise from time-of-day effects.
Easy to explain — 'we trip if the last 20 calls were ≥ 50% failures' is a sentence a non-engineer can understand.

Cons

Insensitive to traffic shape — bursty or low-volume traffic can leave stale data in the window.
No self-recovery in quiet periods — without new calls, the rate stays whatever it was last measured.
Threshold tuning is workload-specific — what counts as 'too many' depends on your normal error rate, which N alone doesn't capture.
Can trip on a tiny sample unless you enforce a minimum-call count — one fail of two calls is 50%.

Reference

Code & further reading

A minimal reference implementation and pointers worth bookmarking.

count_based.go

// Count-based circuit breaker: trip when the failure rate over the last N calls
// crosses the threshold. The simplest sliding window there is.
package breaker

import (
	"errors"
	"time"
)

var ErrOpen = errors.New("circuit open")

type CountBasedBreaker struct {
	state     string  // "CLOSED" | "OPEN" | "HALF"
	window    []bool  // true = ok, false = fail
	trials    []bool
	openUntil time.Time

	size       int
	threshold  float64 // 50%
	cooldown   time.Duration
	trialCount int
	minCalls   int // don't trip on tiny samples
}

func NewCountBasedBreaker() *CountBasedBreaker {
	return &CountBasedBreaker{
		state:      "CLOSED",
		size:       10,
		threshold:  0.5,
		cooldown:   5000 * time.Millisecond,
		trialCount: 3,
		minCalls:   5,
	}
}

func (cb *CountBasedBreaker) Call(work func() error) error {
	if cb.state == "OPEN" {
		if !time.Now().Before(cb.openUntil) {
			cb.state = "HALF"
			cb.trials = nil
		} else {
			return ErrOpen
		}
	}
	if err := work(); err != nil {
		cb.record(false)
		return err
	}
	cb.record(true)
	return nil
}

func (cb *CountBasedBreaker) record(ok bool) {
	if cb.state == "HALF" {
		cb.trials = append(cb.trials, ok)
		if len(cb.trials) >= cb.trialCount {
			if failRate(cb.trials) >= cb.threshold {
				cb.trip()
			} else {
				cb.close()
			}
		}
		return
	}
	cb.window = append(cb.window, ok)
	if len(cb.window) > cb.size {
		cb.window = cb.window[1:]
	}
	if len(cb.window) >= cb.minCalls && failRate(cb.window) >= cb.threshold {
		cb.trip()
	}
}

func failRate(window []bool) float64 {
	fails := 0
	for _, ok := range window {
		if !ok {
			fails++
		}
	}
	return float64(fails) / float64(len(window))
}

func (cb *CountBasedBreaker) trip() {
	cb.state = "OPEN"
	cb.openUntil = time.Now().Add(cb.cooldown)
	cb.trials = nil
}

func (cb *CountBasedBreaker) close() {
	cb.state = "CLOSED"
	cb.window = nil
	cb.trials = nil
}

References & further reading

5 sources

Knowledge check

Did the prototype land?

Quick questions, answers revealed on submit. Sign in to save your best score.

question 01 / 03

What exactly is the 'window' in count-based?

question 02 / 03

Why do most implementations require a minimum number of calls before they will trip?

question 03 / 03

If your service handles roughly 1 request per minute, why is count-based a poor choice?

0/3 answered