Overview
What this concept solves
Three-Phase Commit (3PC) is 2PC's response to its own famous blocking weakness. The fix is precisely one extra phase, inserted between the vote and the commit, with a single purpose: make sure that if any participant has been told the decision is COMMIT, then every other live participant also knows it. That extra information is what lets the survivors decide on their own when the coordinator dies.
The three phases are CanCommit, PreCommit, and DoCommit. CanCommit collects votes (just like 2PC's Prepare). PreCommit is the new bit: once the coordinator has every YES, it sends a pre-commit notice — "we are going to commit" — and waits for acks. Only after a quorum has acked PreCommit does the coordinator send the actual DoCommit. The PreCommit acts as the signal: if every live participant reached it, the consensus is implicit, and a survivor can finish the job without the coordinator.
The catch — and it's a big one — is that 3PC is only safe under synchronous-network assumptions: messages must be either delivered within a known timeout or definitely lost. Real networks aren't synchronous; they're partitioned, GC-paused, and unpredictable. So 3PC is studied a lot more often than it is shipped. In production, the real fix for the 2PC blocking case is to replicate the coordinator with Paxos or Raft instead.
Mechanics
How it works
Phase 1 — CanCommit (collect votes)
- Coordinator sends
CAN-COMMIT?to every participant. - Each participant replies YES (it could commit if asked) or NO. Crucially, the participant does not yet write to its log or hold locks heavily — this is a feasibility check.
- If anyone answers NO, the coordinator immediately aborts and broadcasts ABORT. Done.
Phase 2 — PreCommit (the new phase)
- If every reply was YES, the coordinator writes PRE-COMMIT to its log and sends
PRE-COMMITto every participant. - Each participant prepares durably (the heavy lifting of writing the redo/undo log, locking rows) and replies
ACK. - Once the coordinator has acks from a majority — and only then — it knows the decision can survive its own death. Move to Phase 3.
Phase 3 — DoCommit
- Coordinator sends
DO-COMMITto every participant. - Each participant applies the transaction permanently, releases locks, and replies ACK.
Why the new phase prevents blocking
If the coordinator dies during Phase 3, the survivors check among themselves: did anyone reach PreCommit? If yes, then by the protocol's invariant every survivor reached PreCommit, which means the coordinator could only have decided to commit — they finish the commit on their own. If no, they all abort safely. Either way, no blocking.
Why production rarely ships 3PC
3PC's safety proof depends on a network where every message either arrives within a known bound or is detectably lost. Real wide-area networks have unbounded delays and silent packet loss — under those conditions a partitioned survivor can mistakenly believe the coordinator is dead and commit, while the coordinator was actually alive on the other side of the partition and aborted. So the textbook answer for production is: replicate the coordinator with Raft.
Interactive prototype
Run it. Break it. Tune it.
Sandboxed simulation embedded right in the page. No setup, no install.
About this simulation
The same coordinator + three participants as 2PC, but with an extra phase. Pick a scenario — Happy path, Vote NO, Coordinator crashes after PreCommit (the case 2PC blocks on), or Free play. Step with Prev / Next / Auto / Restart. The log card below the prototype only ever holds the last two lines.
Hands-on
Try these on your own
Open the prototype above, run each experiment, predict the answer, then verify.
Walk the Happy path
Open Happy path and step through. Watch the three phases unfold: CanCommit → all YES → PreCommit → acks → DoCommit. Compare against 2PC: there is exactly one extra round-trip, and the prepared state lives in PreCommit, not the vote phase.
Flip a participant to NO
Switch to Vote NO and toggle one participant. The coordinator decides to abort during CanCommit — Phase 2 never starts. No locks held, no precommit log, no fuss. 3PC's early-abort path is actually cheaper than 2PC's.
Crash the coordinator after PreCommit
Run Coordinator crashes. Every participant has acked PreCommit, then the coordinator dies before DoCommit. Watch the participants notice the timeout, confer among themselves, observe that everyone reached PreCommit, and commit anyway — no blocking. This is the case 2PC famously gets stuck on.
Free play — break it yourself
Open Free play and combine toggles: a NO vote plus a coordinator crash; a coordinator crash before PreCommit (compare to crashing after); all-YES with no crash. Notice that the dangerous network case — a partition that makes survivors think the coordinator is dead when it isn't — is precisely the case the prototype's synchronous assumption hides from you.
In practice
When to use it — and what you give up
When it's the right tool
- Synchronous-network deployments — single-datacenter clusters with predictable RTT and reliable hardware, where the synchronous assumption is approximately true.
- You need atomic commit and cannot tolerate the 2PC blocking case, but a full Raft-replicated coordinator is overkill or unavailable.
- Pedagogical / classical study — 3PC is the canonical illustration of how an extra phase removes blocking, and why network synchrony assumptions matter.
When to reach for something else
- Asynchronous / WAN networks — partitions break 3PC's safety. Use a real consensus protocol (Raft / Paxos) for the coordinator.
- Production systems in 2026 — the de-facto fix is Paxos- or Raft-replicated coordinator. Modern DBs (Spanner, CockroachDB) do exactly this.
- Anything where one of the participants is byzantine — 3PC is crash-tolerant only.
- Long-running workflows — sagas with compensating transactions handle long-tail coordination better than any commit protocol.
Pros
- Non-blocking on coordinator failure (under synchronous assumptions) — survivors can recover the decision themselves.
- Same atomicity guarantee as 2PC in the happy path — every participant commits or every one aborts.
- Clearer recovery semantics — the PreCommit phase makes the survivors' decision deterministic.
- Drops Phase 1's heavy work — the CanCommit poll is light; the durable prepare lives in PreCommit.
Cons
- One extra round-trip — three RTTs in the happy path vs two for 2PC.
- Safety requires a synchronous network — partitions can cause inconsistency in real WAN deployments.
- Still all-or-nothing — one slow participant blocks Phase 2 acks.
- Rarely used in production — the industry settled on Raft-replicated coordinators instead.
- More state to track on every node — explicit prepared/precommit/committed states, more recovery cases to test.
Reference
Code & further reading
A minimal reference implementation and pointers worth bookmarking.
// 3PC coordinator. Adds a PreCommit phase between CanCommit and DoCommit.
type Vote = "YES" | "NO";
interface Participant {
canCommit(txId: string): Promise<Vote>;
preCommit(txId: string): Promise<"ACK">;
doCommit(txId: string): Promise<void>;
abort(txId: string): Promise<void>;
}
async function threePhaseCommit(
txId: string,
participants: Participant[],
log: { write(line: string): Promise<void> },
): Promise<"COMMITTED" | "ABORTED"> {
await log.write(`BEGIN ${txId}`);
// Phase 1 — CanCommit (lightweight feasibility check)
let votes: Vote[];
try {
votes = await withTimeout(
Promise.all(participants.map(p => p.canCommit(txId))),
);
} catch {
await abortAll(txId, participants, log);
return "ABORTED";
}
if (votes.some(v => v === "NO")) {
await abortAll(txId, participants, log);
return "ABORTED";
}
// Phase 2 — PreCommit (the new phase that prevents blocking)
await log.write(`PRECOMMIT ${txId}`);
try {
await withTimeout(
Promise.all(participants.map(p => p.preCommit(txId))),
);
} catch {
await abortAll(txId, participants, log);
return "ABORTED";
}
// Phase 3 — DoCommit
await log.write(`COMMIT ${txId}`);
await retryAll(() => participants.map(p => p.doCommit(txId)));
return "COMMITTED";
}
// On the participant side: if doCommit doesn't arrive within a timeout
// AND we already acked PreCommit, query peers. If any peer also reached
// PreCommit, finish the commit. Otherwise abort. That cooperative
// recovery is what 2PC lacks.References & further reading
5 sources- Paperdl.acm.org
Dale Skeen — *Nonblocking Commit Protocols* (1981)
The original paper that proves a non-blocking atomic-commit protocol exists in the synchronous model — and gives 3PC as the construction.
- Paperwww2.eecs.berkeley.edu
Dale Skeen & Michael Stonebraker — *A Formal Model of Crash Recovery in a Distributed System* (1983)
The follow-up (Berkeley TR M80/48, later IEEE TSE 1983) that formalises the assumptions 3PC relies on — bounded message delay and detectable failures — and shows where the protocol breaks under partition.
- Articleen.wikipedia.org
Wikipedia — Three-Phase Commit Protocol
Concise reference for the message format, the recovery rules, and the well-known counterexamples that make 3PC unsafe on asynchronous networks.
- Bookoreilly.com
Martin Kleppmann — *Designing Data-Intensive Applications* (Chapter 9, "Atomic Commit and Two-Phase Commit")
Discusses why 3PC isn't widely used (network model is too strong) and what production systems do instead.
- Docszookeeper.apache.org
ZooKeeper docs — Why not Two-Phase Commit?
Production motivation for moving past 2PC/3PC entirely: a real coordination service replicates the decision itself with consensus rather than relying on a single coordinator.
Knowledge check
Did the prototype land?
Quick questions, answers revealed on submit. No scoring saved.
question 01 / 03
What's the single most important thing PreCommit gives 3PC that 2PC lacks?
question 02 / 03
Why is 3PC rarely deployed in production despite being non-blocking?
question 03 / 03
A 3PC coordinator crashes immediately after sending PreCommit. The three participants are alive and all received it. What do they do?
0/3 answered
Was this concept helpful?
Tell us what worked, or what to improve. We read every note.