PBFT — Practical Byzantine Fault Tolerance

Overview

What this concept solves

PBFT — Practical Byzantine Fault Tolerance — is Miguel Castro and Barbara Liskov's 1999 algorithm that made byzantine-fault-tolerant state-machine replication actually deployable. "Byzantine" means: some nodes don't just crash — they actively lie, send conflicting messages, or attempt to corrupt the protocol. Before PBFT, BFT algorithms cost so much (exponential message counts, hours per decision) that nobody used them. PBFT cut that to O(n²) messages per request and decisions in milliseconds.

The deal is that fault tolerance comes at a different price. To tolerate f byzantine nodes you need 3f+1 total, and consensus quorums are 2f+1 — overlapping quorums must share at least one honest node, which is what stops malicious nodes from making conflicting commits both succeed. So tolerating one liar needs 4 nodes; tolerating three liars needs 10. That's the structural cost of Byzantine tolerance, and it's why PBFT (and its descendants) live in permissioned blockchains and consortium systems, not your average internal cluster.

PBFT runs in views, each with one primary (the leader) and n-1 backups. The primary proposes an order for incoming requests; the backups cross-check by broadcasting Prepare to each other. If a 2f+1 quorum of matching Prepares forms, every honest node concludes the primary didn't lie — they then broadcast Commit to make the agreement durable. When 2f+1 matching Commits arrive, the request executes and the client receives a reply. If the primary itself misbehaves, backups vote in a view change to rotate the role.

Mechanics

How it works

Normal-case operation (three rounds)

Client → Primary: REQUEST(op).
Pre-Prepare — Primary assigns sequence number n and broadcasts PRE-PREPARE(view, n, op) to every backup.
Prepare — Each backup that accepts the Pre-Prepare broadcasts PREPARE(view, n, op) to every other node. A node is prepared when it has the Pre-Prepare plus 2f matching Prepares from other nodes (total 2f+1 including itself).
Commit — Once prepared, each node broadcasts COMMIT(view, n). A node commits and executes the request when it sees 2f+1 matching Commits.
Reply — Each committed node sends its result directly to the client. The client waits for f+1 matching replies before trusting the answer — since at most f can lie, f+1 matching replies must contain at least one truthful one.

View change (when the primary misbehaves)

Backups suspect the primary on timeout or detected misbehaviour (e.g., conflicting Pre-Prepares).
Each suspecting backup broadcasts VIEW-CHANGE(new view, prepared-set) listing every request it has prepared.
When 2f+1 view-change messages arrive at the new primary (the next server in deterministic round-robin order), it broadcasts NEW-VIEW carrying the recovered prepared-set so honest nodes resume from a consistent point.
Then the cluster resumes normal operation in the new view. Committed requests are preserved by the quorum-intersection invariant.

Why 3f+1 nodes and 2f+1 quorums

With 3f+1 nodes and quorums of 2f+1, any two quorums share at least 2(2f+1) − (3f+1) = f+1 nodes. At most f of those can be liars, so at least one honest node is in both quorums. That honest node won't agree to two conflicting decisions — making conflicting commits structurally impossible.

All-to-all broadcasts cost O(n²) messages

Each round (Prepare, Commit) is an all-to-all broadcast — every node sends to every other. That's O(n²) messages per request — manageable at n = 4, 7, 10, but it scales badly. Modern descendants like HotStuff (used in Diem, Aptos) reduce this to O(n) by routing through the leader with cryptographic signatures.

Interactive prototype

Run it. Break it. Tune it.

Sandboxed simulation embedded right in the page. No setup, no install.

simulation › PBFT — Practical Byzantine Fault Tolerance

About this simulation

Four nodes — one primary, three backups — tolerating one Byzantine (lying) node. Pick a scenario — Happy path, A backup lies (quorum absorbs it), The primary lies (view change kicks in), or Free play where you pick which node is malicious. Step through Pre-Prepare / Prepare / Commit / Reply. Only two log lines are ever shown.

Hands-on

Try these on your own

Open the prototype above, run each experiment, predict the answer, then verify.

try 01

Walk the Happy path

Open Happy path with no Byzantine node. Watch the three rounds: Pre-Prepare (primary broadcasts), Prepare (all-to-all cross-check), Commit (all-to-all confirmation), Reply. Notice that even with no liar, the cross-check is still done — that's how the protocol stays safe without trust.

try 02

See a backup liar get absorbed

Switch to A backup lies — pick any non-primary node. It sends a bogus PREPARE that doesn't match. The honest nodes still get 3 (=2f+1) matching Prepares from each other, ignore the liar's message, and the request commits with no disruption. This is what "fault tolerance" really means in BFT: the lie is silently absorbed.

try 03

Watch a faulty primary trigger view change

Run The primary lies. The primary sends conflicting Pre-Prepares to different backups. Backups detect the mismatch during Prepare — they cannot form a quorum — and vote to change view. The primary role rotates to the next honest node, which redoes Pre-Prepare. The protocol heals itself.

try 04

Free play — break it yourself

Open Free play and pick different Byzantine nodes. With one liar, 4-node PBFT survives every scenario — that's f=1 tolerance. Try toggling honesty mid-request. Notice that 2f+1 = 3 always shows up as the magic quorum number: enough that a single liar can't tip the result.

In practice

When to use it — and what you give up

When it's the right tool

Permissioned blockchains and consortium systems — Hyperledger Fabric, Tendermint, IBFT (Quorum, ConsenSys), HotStuff (Diem/Aptos). Known validators, but some may be malicious.
Cross-organisation data sharing with no single trusted party — financial settlement networks, supply-chain provenance.
Critical infrastructure where Byzantine nodes are a real threat (compromised servers, insider attacks).
Small clusters where the O(n²) cost is acceptable — typically up to ~20 validators.

When to reach for something else

Inside one trust boundary (your own datacenter, your own replicas) — use Raft or Multi-Paxos. Crash-fault tolerance is cheap; BFT is expensive and unnecessary if all your machines are yours.
Public, permissionless blockchains — Nakamoto consensus (PoW) or modern PoS protocols handle open membership and Sybil resistance; PBFT requires known validator sets.
Very large validator counts (>50) — switch to HotStuff or chained variants that linearise the message cost.
Atomic commit across trusted services — that's 2PC, not BFT.

Pros

Tolerates active malice, not just crashes — nodes that lie, send conflicting messages, or attempt to corrupt the agreement.
Practical message complexity — O(n²) per request, milliseconds per decision; was a watershed result vs the exponential earlier BFT algorithms.
Deterministic finality — once committed, a request is final (no probabilistic settlement like PoW).
Well-studied — three decades of formal analyses, optimisations, and production deployments in permissioned blockchains.
Foundation of modern BFT — Tendermint, HotStuff, IBFT and most permissioned chain consensus algorithms inherit PBFT's structure.

Cons

Needs 3f+1 nodes to tolerate f liars — 4 nodes for 1, 7 for 2, 10 for 3. Triple the cost of crash-fault majority quorums.
O(n²) messages per request — every node broadcasts to every other in Prepare and Commit phases.
Scales poorly past ~20 validators — modern BFT (HotStuff and chained variants) replaces the all-to-all phases with leader-pivot signatures to recover O(n).
View change is intricate and expensive — a real source of corner-case bugs in implementations.
Static membership — adding or removing a validator requires careful re-keying; doesn't handle open / churning sets gracefully.

Reference

Code & further reading

A minimal reference implementation and pointers worth bookmarking.

pbft.go

// Skeleton of PBFT normal-case operation on a backup node.
// All cross-node messages are signed; signature checks omitted for clarity.

// One message type with a Kind field models the PRE-PREPARE / PREPARE / COMMIT
// discriminated union (op is unused for COMMIT).
type Kind string

const (
	PrePrepare Kind = "PRE-PREPARE"
	Prepare    Kind = "PREPARE"
	Commit     Kind = "COMMIT"
)

type Msg struct {
	Kind Kind
	View int
	Seq  int
	Op   string // unused for COMMIT
	From int
}

type PbftBackup struct {
	myId int
	n    int

	view       int
	prePrepare map[int]string         // seq -> op
	prepares   map[int]map[int]string // seq -> from -> op
	commits    map[int]map[int]bool   // seq -> set of senders
	prepared   map[int]bool
	committed  map[int]bool
}

func NewPbftBackup(myId, n int) *PbftBackup {
	return &PbftBackup{
		myId:       myId,
		n:          n,
		prePrepare: make(map[int]string),
		prepares:   make(map[int]map[int]string),
		commits:    make(map[int]map[int]bool),
		prepared:   make(map[int]bool),
		committed:  make(map[int]bool),
	}
}

// 2f+1 quorum
func (p *PbftBackup) quorum() int {
	f := (p.n - 1) / 3
	return 2*f + 1
}

func (p *PbftBackup) OnPrePrepare(m Msg) {
	if m.View != p.view {
		return
	}
	if _, ok := p.prePrepare[m.Seq]; ok {
		return // already have one for this seq
	}
	p.prePrepare[m.Seq] = m.Op
	p.broadcast(Msg{Kind: Prepare, View: p.view, Seq: m.Seq, Op: m.Op, From: p.myId})
}

func (p *PbftBackup) OnPrepare(m Msg) {
	if m.View != p.view {
		return
	}
	if op, ok := p.prePrepare[m.Seq]; !ok || op != m.Op {
		return
	}
	bag := p.prepares[m.Seq]
	if bag == nil {
		bag = make(map[int]string)
		p.prepares[m.Seq] = bag
	}
	bag[m.From] = m.Op
	// Need pre-prepare + 2f matching prepares from others = 2f+1 total.
	if len(bag) >= p.quorum()-1 && !p.prepared[m.Seq] {
		p.prepared[m.Seq] = true
		p.broadcast(Msg{Kind: Commit, View: p.view, Seq: m.Seq, From: p.myId})
	}
}

func (p *PbftBackup) OnCommit(m Msg) {
	if m.View != p.view {
		return
	}
	set := p.commits[m.Seq]
	if set == nil {
		set = make(map[int]bool)
		p.commits[m.Seq] = set
	}
	set[m.From] = true
	if len(set) >= p.quorum() && !p.committed[m.Seq] && p.prepared[m.Seq] {
		p.committed[m.Seq] = true
		p.execute(p.prePrepare[m.Seq])
		p.replyToClient(m.Seq)
	}
}

func (p *PbftBackup) execute(_op string) {} // apply to state machine
func (p *PbftBackup) replyToClient(_seq int) {} // signed reply
func (p *PbftBackup) broadcast(_m Msg) {} // send to all other n-1 nodes

References & further reading

6 sources

Knowledge check

Did the prototype land?

Quick questions, answers revealed on submit. Sign in to save your best score.

question 01 / 03

PBFT needs 3f+1 nodes to tolerate f Byzantine failures. What's the underlying reason?

question 02 / 03

A 4-node PBFT cluster has one Byzantine backup (not the primary). What does the protocol do?

question 03 / 03

What triggers a PBFT view change?

0/3 answered