Overview
What this concept solves
Two-Phase Commit (2PC) is the textbook atomic-commit protocol — the one every distributed-systems course opens with. One process plays the role of coordinator; the rest are participants that each hold a piece of the transaction. The coordinator's job is to get every participant to commit, or get every participant to abort. Half-committed is forbidden.
The shape is exactly what the name says: two phases. In Phase 1 — Prepare, the coordinator asks every participant "can you commit?" Each participant prepares — writes the transaction to a durable log, holds the locks — then replies YES or NO. In Phase 2 — Decide, the coordinator looks at the votes. If every reply was YES, it broadcasts COMMIT. If even one was NO (or never arrived), it broadcasts ABORT. Either way the decision is unanimous.
2PC delivers atomicity — but it pays a famous price. If the coordinator dies after collecting YES votes but before broadcasting the decision, the participants are stuck. They can't unilaterally abort (the coordinator might still be alive and have told someone else to commit), and they can't unilaterally commit (the same logic, in reverse). They block, holding locks, until the coordinator comes back. That blocking is what every protocol after 2PC tries to fix.
Mechanics
How it works
Phase 1 — Prepare (the vote)
- Coordinator writes BEGIN to its log, then sends
PREPAREto every participant. - Each participant tentatively performs the work: locks rows, writes redo/undo records, fsyncs a
PREPAREDlog entry. From this point on, the participant has promised it can commit if asked. - The participant replies
YESif it managed to prepare, orNOif it ran out of disk, hit a constraint, or otherwise can't commit.
Phase 2 — Decide (the broadcast)
- If every vote is YES, the coordinator writes
COMMITto its log (this is the moment of decision) and sendsCOMMITto every participant. - If any vote is NO, the coordinator writes
ABORTand sendsABORTto every participant. - Each participant applies the decision permanently, releases locks, fsyncs the outcome, and replies
ACK.
The blocking failure
If the coordinator crashes after some participants received COMMIT but others did not, the survivors are stuck. They asked their peers, but the peers don't know the decision either — the coordinator's log is the single source of truth. They wait, holding locks, until the coordinator recovers (it reads its log and resumes). This is the famous blocking case, and the entire reason 3PC, Paxos-replicated coordinators, and saga patterns exist.
The atomicity guarantee in one sentence
Once the coordinator writes COMMIT, every participant will eventually commit (even if they have to be reminded after recovery); once the coordinator writes ABORT, every participant will eventually abort. The protocol never leaves participants in disagreement once the coordinator has written its decision.
Interactive prototype
Run it. Break it. Tune it.
Sandboxed simulation embedded right in the page. No setup, no install.
About this simulation
Three participants and one coordinator running a 2PC transaction. Pick a scenario — Happy path, A participant votes NO, or Coordinator crashes, or jump into Free play and toggle votes yourself. Use Prev / Next / Auto / Restart; the message log below the prototype keeps only the last two lines so it never grows.
Hands-on
Try these on your own
Open the prototype above, run each experiment, predict the answer, then verify.
Walk the Happy path
Open the Happy path scenario and step through with Next. Watch the coordinator send PREPARE, all three participants reply YES, the coordinator write COMMIT, and every participant move to COMMITTED. Notice that the decision (COMMIT) is written before it's broadcast — that is the durable atomicity anchor.
Make a participant vote NO
Switch to the A participant votes NO scenario. One participant — say P2 — replies NO during the vote. The coordinator immediately writes ABORT and tells everyone to roll back. Notice that even the participants who voted YES still abort: the rule is unanimous YES required, not majority.
Crash the coordinator mid-decision
Run the Coordinator crashes scenario. Every participant has voted YES, then the coordinator dies before sending COMMIT. The participants are stuck — they have promised to commit if asked but cannot proceed without the coordinator's word. This is the blocking case, and the whole motivation for 3PC.
Free play — break it yourself
Open Free play and toggle the per-participant vote checkboxes before stepping through. Try every combination: one NO, two NOs, three YESes. Try crashing the coordinator at different points (after Prepare, after votes, after Commit). The protocol holds the same invariant every time — and the same blocking weakness every time.
In practice
When to use it — and what you give up
When it's the right tool
- Cross-shard / cross-service transactions where you need true atomicity and the participants are known ahead of time — XA distributed transactions, database-internal cross-partition writes, SQL across multiple PostgreSQL shards.
- Short transactions where the blocking case is rare and acceptable — milliseconds-long writes, intra-datacenter calls with reliable nodes.
- You can replicate the coordinator with a real consensus protocol (Raft / Paxos) so its decision survives its crash — this is exactly what Spanner does.
- The simplest possible mental model is your priority — 2PC is what every operator already understands, and that has real value.
When to reach for something else
- Long-running workflows across services (order → payment → shipping) — use a [saga](https://microservices.io/patterns/data/saga.html) with compensating actions instead. 2PC's lock-holding hurts.
- You cannot tolerate the blocking case — replicate the coordinator (Raft), or use 3PC, or restructure as eventual consistency with reconciliation.
- Replicated state machines and log replication — that's Paxos and Raft territory, not 2PC.
- Byzantine fault model — 2PC assumes participants only crash, not lie. Use PBFT or similar.
Pros
- Simplest possible atomic-commit protocol — two phases, three message types, fits on a whiteboard.
- Strong atomicity guarantee — every participant commits, or every participant aborts. No half-states once the coordinator decides.
- Well-understood — XA, JTA, MS DTC, all the textbooks; battle-tested across decades of databases.
- Cheap in the happy path — just two round-trips and no consensus quorum machinery.
- Compositional — you can wrap any resource manager (DB, queue, file system) that exposes prepare/commit/abort.
Cons
- Blocks on coordinator failure — the famous case: coordinator dies after collecting votes, participants hang holding locks.
- Holds locks across both phases — long latency multiplies contention.
- All-or-nothing fragility — one slow or dead participant stalls every other one.
- Synchronous and chatty — every participant must be reachable for every transaction.
- No fault tolerance on the decision itself — without external replication, the coordinator is a single point of failure.
Reference
Code & further reading
A minimal reference implementation and pointers worth bookmarking.
// Coordinator-side 2PC. Each participant exposes prepare/commit/abort
// and persists its decision durably before replying.
type Vote = "YES" | "NO";
interface Participant {
prepare(txId: string): Promise<Vote>;
commit(txId: string): Promise<void>;
abort(txId: string): Promise<void>;
}
async function twoPhaseCommit(
txId: string,
participants: Participant[],
log: { write(line: string): Promise<void> },
): Promise<"COMMITTED" | "ABORTED"> {
await log.write(`BEGIN ${txId}`);
// Phase 1 — Prepare
let votes: Vote[];
try {
votes = await Promise.all(participants.map(p => p.prepare(txId)));
} catch {
// a participant failed to prepare — treat as NO
await log.write(`ABORT ${txId}`);
await Promise.allSettled(participants.map(p => p.abort(txId)));
return "ABORTED";
}
// Phase 2 — Decide
const allYes = votes.every(v => v === "YES");
if (allYes) {
await log.write(`COMMIT ${txId}`); // <- the durable decision
// Retry forever; participants must eventually apply the commit.
await retryAll(() => participants.map(p => p.commit(txId)));
return "COMMITTED";
} else {
await log.write(`ABORT ${txId}`);
await retryAll(() => participants.map(p => p.abort(txId)));
return "ABORTED";
}
}
// On recovery, the coordinator reads its log: if it sees COMMIT/ABORT
// for txId, it replays the broadcast. If it sees BEGIN but no decision,
// it aborts. That single-source-of-truth log is what makes 2PC atomic.References & further reading
6 sources- Paperjimgray.azurewebsites.net
Jim Gray — *Notes on Database Operating Systems* (1978)
The original write-up of 2PC, by the author who coined transactions. Section 5 is the canonical description of the protocol and the recovery rules.
- Bookmicrosoft.com
Bernstein, Hadzilacos & Goodman — *Concurrency Control and Recovery in Database Systems* (Chapter 7)
Textbook treatment of atomic commit protocols including the formal proof that 2PC is correct in the crash-failure model. Free PDF from the authors.
- Specpubs.opengroup.org
X/Open XA Specification
The industry-standard interface that databases and transaction managers implement to participate in 2PC across vendors. JTA, MS DTC, and Oracle all conform.
- Docspostgresql.org
PostgreSQL docs — Two-Phase Commit (PREPARE TRANSACTION)
Concrete production implementation. Shows the exact SQL surface, the on-disk pg_twophase directory, and how recovery replays prepared transactions.
- Paperics.uci.edu
Pat Helland — *Life beyond Distributed Transactions: an Apostate's Opinion* (2007)
The famous polemic against 2PC at scale. Argues you should design around eventual consistency rather than chase atomicity across services — required reading for anyone reaching for XA.
- Bookoreilly.com
Martin Kleppmann — *Designing Data-Intensive Applications* (Chapter 9)
The clearest modern explanation of 2PC, where it's used (XA), why it blocks, and what production systems do instead (Spanner replicates the coordinator with Paxos).
Knowledge check
Did the prototype land?
Quick questions, answers revealed on submit. No scoring saved.
question 01 / 03
In 2PC, what happens if the coordinator crashes immediately after writing `COMMIT` to its own log but before sending any COMMIT message?
question 02 / 03
How many YES votes does the coordinator need before it can decide to commit?
question 03 / 03
Which production system avoids the 2PC blocking problem by *replicating the coordinator itself*?
0/3 answered
Was this concept helpful?
Tell us what worked, or what to improve. We read every note.