Three-Phase Commit (3PC) — Consensus

Overview

What this concept solves

Three-Phase Commit (3PC) is 2PC's response to its own famous blocking weakness. The fix is precisely one extra phase, inserted between the vote and the commit, with a single purpose: make sure that if any participant has been told the decision is COMMIT, then every other live participant also knows it. That extra information is what lets the survivors decide on their own when the coordinator dies.

The three phases are CanCommit, PreCommit, and DoCommit. CanCommit collects votes (just like 2PC's Prepare). PreCommit is the new bit: once the coordinator has every YES, it sends a pre-commit notice — "we are going to commit" — and waits for acks. Only after a quorum has acked PreCommit does the coordinator send the actual DoCommit. The PreCommit acts as the signal: if every live participant reached it, the consensus is implicit, and a survivor can finish the job without the coordinator.

The catch — and it's a big one — is that 3PC is only safe under synchronous-network assumptions: messages must be either delivered within a known timeout or definitely lost. Real networks aren't synchronous; they're partitioned, GC-paused, and unpredictable. So 3PC is studied a lot more often than it is shipped. In production, the real fix for the 2PC blocking case is to replicate the coordinator with Paxos or Raft instead.

Mechanics

How it works

Phase 1 — CanCommit (collect votes)

Coordinator sends CAN-COMMIT? to every participant.
Each participant replies YES (it could commit if asked) or NO. Crucially, the participant does not yet write to its log or hold locks heavily — this is a feasibility check.
If anyone answers NO, the coordinator immediately aborts and broadcasts ABORT. Done.

Phase 2 — PreCommit (the new phase)

If every reply was YES, the coordinator writes PRE-COMMIT to its log and sends PRE-COMMIT to every participant.
Each participant prepares durably (the heavy lifting of writing the redo/undo log, locking rows) and replies ACK.
Once the coordinator has acks from a majority — and only then — it knows the decision can survive its own death. Move to Phase 3.

Phase 3 — DoCommit

Coordinator sends DO-COMMIT to every participant.
Each participant applies the transaction permanently, releases locks, and replies ACK.

Why the new phase prevents blocking

If the coordinator dies during Phase 3, the survivors check among themselves: did anyone reach PreCommit? If yes, then by the protocol's invariant every survivor reached PreCommit, which means the coordinator could only have decided to commit — they finish the commit on their own. If no, they all abort safely. Either way, no blocking.

Why production rarely ships 3PC

3PC's safety proof depends on a network where every message either arrives within a known bound or is detectably lost. Real wide-area networks have unbounded delays and silent packet loss — under those conditions a partitioned survivor can mistakenly believe the coordinator is dead and commit, while the coordinator was actually alive on the other side of the partition and aborted. So the textbook answer for production is: replicate the coordinator with Raft.

Interactive prototype

Run it. Break it. Tune it.

Sandboxed simulation embedded right in the page. No setup, no install.

simulation › Three-Phase Commit (3PC)

About this simulation

The same coordinator + three participants as 2PC, but with an extra phase. Pick a scenario — Happy path, Vote NO, Coordinator crashes after PreCommit (the case 2PC blocks on), or Free play. Step with Prev / Next / Auto / Restart. The log card below the prototype only ever holds the last two lines.

Hands-on

Try these on your own

Open the prototype above, run each experiment, predict the answer, then verify.

try 01

Walk the Happy path

Open Happy path and step through. Watch the three phases unfold: CanCommit → all YES → PreCommit → acks → DoCommit. Compare against 2PC: there is exactly one extra round-trip, and the prepared state lives in PreCommit, not the vote phase.

try 02

Flip a participant to NO

Switch to Vote NO and toggle one participant. The coordinator decides to abort during CanCommit — Phase 2 never starts. No locks held, no precommit log, no fuss. 3PC's early-abort path is actually cheaper than 2PC's.

try 03

Crash the coordinator after PreCommit

Run Coordinator crashes. Every participant has acked PreCommit, then the coordinator dies before DoCommit. Watch the participants notice the timeout, confer among themselves, observe that everyone reached PreCommit, and commit anyway — no blocking. This is the case 2PC famously gets stuck on.

try 04

Free play — break it yourself

Open Free play and combine toggles: a NO vote plus a coordinator crash; a coordinator crash before PreCommit (compare to crashing after); all-YES with no crash. Notice that the dangerous network case — a partition that makes survivors think the coordinator is dead when it isn't — is precisely the case the prototype's synchronous assumption hides from you.

In practice

When to use it — and what you give up

When it's the right tool

Synchronous-network deployments — single-datacenter clusters with predictable RTT and reliable hardware, where the synchronous assumption is approximately true.
You need atomic commit and cannot tolerate the 2PC blocking case, but a full Raft-replicated coordinator is overkill or unavailable.
Pedagogical / classical study — 3PC is the canonical illustration of how an extra phase removes blocking, and why network synchrony assumptions matter.

When to reach for something else

Asynchronous / WAN networks — partitions break 3PC's safety. Use a real consensus protocol (Raft / Paxos) for the coordinator.
Production systems in 2026 — the de-facto fix is Paxos- or Raft-replicated coordinator. Modern DBs (Spanner, CockroachDB) do exactly this.
Anything where one of the participants is byzantine — 3PC is crash-tolerant only.
Long-running workflows — sagas with compensating transactions handle long-tail coordination better than any commit protocol.

Pros

Non-blocking on coordinator failure (under synchronous assumptions) — survivors can recover the decision themselves.
Same atomicity guarantee as 2PC in the happy path — every participant commits or every one aborts.
Clearer recovery semantics — the PreCommit phase makes the survivors' decision deterministic.
Drops Phase 1's heavy work — the CanCommit poll is light; the durable prepare lives in PreCommit.

Cons

One extra round-trip — three RTTs in the happy path vs two for 2PC.
Safety requires a synchronous network — partitions can cause inconsistency in real WAN deployments.
Still all-or-nothing — one slow participant blocks Phase 2 acks.
Rarely used in production — the industry settled on Raft-replicated coordinators instead.
More state to track on every node — explicit prepared/precommit/committed states, more recovery cases to test.

Reference

Code & further reading

A minimal reference implementation and pointers worth bookmarking.

three_phase_commit.go

// 3PC coordinator. Adds a PreCommit phase between CanCommit and DoCommit.
package threepc

import "fmt"

type Vote string

const (
	VoteYes Vote = "YES"
	VoteNo  Vote = "NO"
)

type Participant interface {
	CanCommit(txID string) (Vote, error)
	PreCommit(txID string) (string, error) // returns "ACK"
	DoCommit(txID string) error
	Abort(txID string) error
}

type Log interface {
	Write(line string) error
}

func ThreePhaseCommit(txID string, participants []Participant, log Log) (string, error) {
	if err := log.Write(fmt.Sprintf("BEGIN %s", txID)); err != nil {
		return "", err
	}

	// Phase 1 — CanCommit (lightweight feasibility check)
	votes, err := withTimeout(func() ([]Vote, error) {
		out := make([]Vote, len(participants))
		for i, p := range participants {
			v, err := p.CanCommit(txID)
			if err != nil {
				return nil, err
			}
			out[i] = v
		}
		return out, nil
	})
	if err != nil {
		abortAll(txID, participants, log)
		return "ABORTED", nil
	}
	for _, v := range votes {
		if v == VoteNo {
			abortAll(txID, participants, log)
			return "ABORTED", nil
		}
	}

	// Phase 2 — PreCommit (the new phase that prevents blocking)
	if err := log.Write(fmt.Sprintf("PRECOMMIT %s", txID)); err != nil {
		return "", err
	}
	_, err = withTimeout(func() ([]Vote, error) {
		for _, p := range participants {
			if _, err := p.PreCommit(txID); err != nil {
				return nil, err
			}
		}
		return nil, nil
	})
	if err != nil {
		abortAll(txID, participants, log)
		return "ABORTED", nil
	}

	// Phase 3 — DoCommit
	if err := log.Write(fmt.Sprintf("COMMIT %s", txID)); err != nil {
		return "", err
	}
	retryAll(func() error {
		for _, p := range participants {
			if err := p.DoCommit(txID); err != nil {
				return err
			}
		}
		return nil
	})
	return "COMMITTED", nil
}

// On the participant side: if doCommit doesn't arrive within a timeout
// AND we already acked PreCommit, query peers. If any peer also reached
// PreCommit, finish the commit. Otherwise abort. That cooperative
// recovery is what 2PC lacks.

References & further reading

5 sources

Knowledge check

Did the prototype land?

Quick questions, answers revealed on submit. Sign in to save your best score.

question 01 / 03

What's the single most important thing PreCommit gives 3PC that 2PC lacks?

question 02 / 03

Why is 3PC rarely deployed in production despite being non-blocking?

question 03 / 03

A 3PC coordinator crashes immediately after sending PreCommit. The three participants are alive and all received it. What do they do?

0/3 answered