Overview
What this concept solves
Round robin and random both assume requests are interchangeable. They aren't. A database query might run for two seconds; the health check behind it finishes in five milliseconds. When durations vary, counting requests tells you nothing about how busy a server actually is. Least connections fixes that by routing to whichever backend currently has the fewest open connections — a direct, live proxy for 'who is least busy.'
This is the first genuinely state-aware algorithm in the set. Instead of a blind counter, the balancer keeps a live tally of in-flight requests per server, increments it when it dispatches, and decrements it when the response completes. New work flows to the shortest queue — exactly how you'd pick a checkout line at the supermarket.
Mechanics
How it works
Track active connections, route to the minimum
- Keep an
active[]counter per server, all starting at 0. - On a new request, pick the server with the smallest
activecount (break ties however you like — lowest index, or random). - Increment that server's
activeimmediately, then forward the request. - When the response finishes, decrement that server's
active.
Because slow requests keep their connection open longer, a server stuck with heavy work shows a higher active count and is automatically passed over for new requests until it catches up. The algorithm adapts to real load without ever measuring CPU or response time — connection count is the signal.
Increment eagerly, not on arrival
There's a subtle correctness trap the prototype is built to show. The counter must increment the instant the balancer commits to a server — not when the request finally lands at the backend. If you wait, a fast burst of arrivals all see the same stale 'everyone has 0 connections' snapshot and stampede onto the same server. The prototype increments at the moment of decision (watch the load balancer box pulse and a dashed 'reserved slot' appear before the request even arrives), which is why rapid 'Send 10' bursts still fan out cleanly.
Distributed least-connections is hard
A single balancer sees its own dispatches perfectly. But several independent balancers each only know about their own connections — none has the global active count. They can collectively overload a backend that each thought was idle. This staleness is the central challenge of distributed load balancing, and a big reason power-of-two-choices exists.
Interactive prototype
Run it. Break it. Tune it.
Sandboxed simulation embedded right in the page. No setup, no install.
About this simulation
Now requests have duration: each carries a random service time, drawn as a bar that drains while the server works it. The balancer routes every new request to whichever server has the fewest active connections right now — watch the 'Active' counts stay tight even as durations vary wildly. Try 'Send 10' to stress it.
Hands-on
Try these on your own
Open the prototype above, run each experiment, predict the answer, then verify.
Watch durations drive the balance
Hit 'Auto' and watch the bars: each request's bar height is its remaining service time, and it drains as the server works. The balancer keeps sending new requests to the server with the lowest 'Active' count, so a server that caught a tall (slow) bar gets skipped until it drains. The 'Spread (max − min)' stat stays small — that's the whole point.
Stress it with a burst
Click 'Send 10'. Ten requests fire in quick succession — and notice they fan out across all four servers instead of stacking on one. That's the eager counter: each pick increments 'Active' immediately (watch the LB box pulse and the dashed reserved slot appear), so the next pick already sees the updated counts.
Compare against round robin in your head
Crank 'Arrival every' down so requests pour in. Some bars are short, some tall, yet active counts stay balanced. Round robin would have ignored those bar heights entirely and dealt every fourth request to the same server regardless of how loaded it was — that gap is exactly what least connections closes.
In practice
When to use it — and what you give up
When to reach for it
- Variable request durations — anything where some requests are far heavier than others: search, reporting, file processing.
- Long-lived connections — websockets, streaming, database connection pools, SSE. Connection count is the load here.
- A single balancer (or a small coordinated set) that can see most of the traffic and keep an accurate active count.
- Mixed-latency backends — if one server is transiently slow (GC pause, cold cache), its connections drain slower, so it naturally receives less new work.
Real-world default for L4
Least connections (leastconn in HAProxy, least_conn in NGINX) is the go-to for TCP/L4 balancing and any workload with long or uneven sessions. It's often the recommended upgrade from round robin the moment request durations stop being uniform.
Pros
- Adapts to real, live load instead of assuming uniform requests.
- Excellent for long-lived or highly variable connections.
- Naturally routes around a transiently slow server — its connections drain slower, so it gets fewer new ones.
- Still cheap: O(N) to scan for the minimum (or O(log N) with a heap), one counter per server.
Cons
- Requires per-server connection state — more bookkeeping than stateless schemes.
- Connection count isn't perfect: many cheap connections can outweigh a few expensive ones.
- Distributed balancers see only their own connections; stale global state can still overload a backend.
- Blind to server capacity — treats a big and small server as equal (that's what Weighted Least Connections fixes).
Reference
Code & further reading
A minimal reference implementation and pointers worth bookmarking.
// Least connections: route to the shortest queue, count eagerly.
class LeastConnBalancer {
private active: number[];
constructor(private servers: string[]) {
this.active = servers.map(() => 0);
}
// Pick the least-busy server and reserve a slot immediately.
acquire(): number {
let best = 0;
for (let i = 1; i < this.active.length; i++) {
if (this.active[i] < this.active[best]) best = i;
}
this.active[best]++; // increment NOW, at decision time —
return best; // not when the request reaches the backend
}
// Call when the response completes.
release(serverIndex: number): void {
this.active[serverIndex]--;
}
}
// Usage
const lb = new LeastConnBalancer(["s1", "s2", "s3", "s4"]);
const i = lb.acquire(); // forward request to servers[i]
// ... await response ...
lb.release(i);References & further reading
6 sources- Docsdocs.nginx.com
NGINX — least_conn
NGINX's least-connections method and how it combines with weights.
- Docshaproxy.com
HAProxy — Load balancing algorithms (leastconn)
When HAProxy recommends
leastconnover roundrobin: long sessions (LDAP, SQL, TCP) and variable request durations. - Docsenvoyproxy.io
Envoy — Load balancers (least request)
Envoy's 'least request' LB — and crucially, how it falls back to power-of-two-choices when it can't see every backend's full state.
- Docsaws.amazon.com
AWS — ALB now supports Least Outstanding Requests
AWS's name for least-connections, and exactly the case they recommend it for: requests that vary in complexity across uneven targets.
- Talkinfoq.com
Tyler McMullen — Load Balancing is Impossible (talk)
The 'horrific edge cases' of least-conns in distributed setups — the stale-state stampede this concept warns about, demonstrated.
- Booksre.google
Google SRE Book — Ch. 20: Load Balancing in the Datacenter
Why connection count alone is an imperfect load signal at scale, and what Google layers on top of it.
Knowledge check
Did the prototype land?
Quick questions, answers revealed on submit. No scoring saved.
question 01 / 03
What signal does least connections use to choose a server?
question 02 / 03
Why must the active-connection counter be incremented at the moment of the routing decision, not when the request reaches the backend?
question 03 / 03
Why is least connections harder to get right across multiple independent load balancers?
0/3 answered
Was this concept helpful?
Tell us what worked, or what to improve. We read every note.