Least Connections — Load Balancing

Overview

What this concept solves

Round robin and random both assume requests are interchangeable. They aren't. A database query might run for two seconds; the health check behind it finishes in five milliseconds. When durations vary, counting requests tells you nothing about how busy a server actually is. Least connections fixes that by routing to whichever backend currently has the fewest open connections — a direct, live proxy for 'who is least busy.'

This is the first genuinely state-aware algorithm in the set. Instead of a blind counter, the balancer keeps a live tally of in-flight requests per server, increments it when it dispatches, and decrements it when the response completes. New work flows to the shortest queue — exactly how you'd pick a checkout line at the supermarket.

Mechanics

How it works

Track active connections, route to the minimum

Keep an active[] counter per server, all starting at 0.
On a new request, pick the server with the smallest active count (break ties however you like — lowest index, or random).
Increment that server's active immediately, then forward the request.
When the response finishes, decrement that server's active.

Because slow requests keep their connection open longer, a server stuck with heavy work shows a higher active count and is automatically passed over for new requests until it catches up. The algorithm adapts to real load without ever measuring CPU or response time — connection count is the signal.

Increment eagerly, not on arrival

There's a subtle correctness trap the prototype is built to show. The counter must increment the instant the balancer commits to a server — not when the request finally lands at the backend. If you wait, a fast burst of arrivals all see the same stale 'everyone has 0 connections' snapshot and stampede onto the same server. The prototype increments at the moment of decision (watch the load balancer box pulse and a dashed 'reserved slot' appear before the request even arrives), which is why rapid 'Send 10' bursts still fan out cleanly.

Distributed least-connections is hard

A single balancer sees its own dispatches perfectly. But several independent balancers each only know about their own connections — none has the global active count. They can collectively overload a backend that each thought was idle. This staleness is the central challenge of distributed load balancing, and a big reason power-of-two-choices exists.

Interactive prototype

Run it. Break it. Tune it.

Sandboxed simulation embedded right in the page. No setup, no install.

simulation › Least Connections

About this simulation

Now requests have duration: each carries a random service time, drawn as a bar that drains while the server works it. The balancer routes every new request to whichever server has the fewest active connections right now — watch the 'Active' counts stay tight even as durations vary wildly. Try 'Send 10' to stress it.

Hands-on

Try these on your own

Open the prototype above, run each experiment, predict the answer, then verify.

try 01

Watch durations drive the balance

Hit 'Auto' and watch the bars: each request's bar height is its remaining service time, and it drains as the server works. The balancer keeps sending new requests to the server with the lowest 'Active' count, so a server that caught a tall (slow) bar gets skipped until it drains. The 'Spread (max − min)' stat stays small — that's the whole point.

try 02

Stress it with a burst

Click 'Send 10'. Ten requests fire in quick succession — and notice they fan out across all four servers instead of stacking on one. That's the eager counter: each pick increments 'Active' immediately (watch the LB box pulse and the dashed reserved slot appear), so the next pick already sees the updated counts.

try 03

Compare against round robin in your head

Crank 'Arrival every' down so requests pour in. Some bars are short, some tall, yet active counts stay balanced. Round robin would have ignored those bar heights entirely and dealt every fourth request to the same server regardless of how loaded it was — that gap is exactly what least connections closes.

In practice

When to use it — and what you give up

When to reach for it

Variable request durations — anything where some requests are far heavier than others: search, reporting, file processing.
Long-lived connections — websockets, streaming, database connection pools, SSE. Connection count is the load here.
A single balancer (or a small coordinated set) that can see most of the traffic and keep an accurate active count.
Mixed-latency backends — if one server is transiently slow (GC pause, cold cache), its connections drain slower, so it naturally receives less new work.

Real-world default for L4

Least connections (leastconn in HAProxy, least_conn in NGINX) is the go-to for TCP/L4 balancing and any workload with long or uneven sessions. It's often the recommended upgrade from round robin the moment request durations stop being uniform.

Pros

Adapts to real, live load instead of assuming uniform requests.
Excellent for long-lived or highly variable connections.
Naturally routes around a transiently slow server — its connections drain slower, so it gets fewer new ones.
Still cheap: O(N) to scan for the minimum (or O(log N) with a heap), one counter per server.

Cons

Requires per-server connection state — more bookkeeping than stateless schemes.
Connection count isn't perfect: many cheap connections can outweigh a few expensive ones.
Distributed balancers see only their own connections; stale global state can still overload a backend.
Blind to server capacity — treats a big and small server as equal (that's what Weighted Least Connections fixes).

Reference

Code & further reading

A minimal reference implementation and pointers worth bookmarking.

least_connections.go

package lb

// Least connections: route to the shortest queue, count eagerly.
type LeastConnBalancer struct {
	servers []string
	active  []int
}

func NewLeastConnBalancer(servers []string) *LeastConnBalancer {
	return &LeastConnBalancer{
		servers: servers,
		active:  make([]int, len(servers)),
	}
}

// Acquire picks the least-busy server and reserves a slot immediately.
func (b *LeastConnBalancer) Acquire() int {
	best := 0
	for i := 1; i < len(b.active); i++ {
		if b.active[i] < b.active[best] {
			best = i
		}
	}
	b.active[best]++ // increment NOW, at decision time —
	return best      // not when the request reaches the backend
}

// Release is called when the response completes.
func (b *LeastConnBalancer) Release(serverIndex int) {
	b.active[serverIndex]--
}

// Usage
// lb := NewLeastConnBalancer([]string{"s1", "s2", "s3", "s4"})
// i := lb.Acquire() // forward request to servers[i]
// // ... await response ...
// lb.Release(i)

References & further reading

6 sources

Knowledge check

Did the prototype land?

Quick questions, answers revealed on submit. Sign in to save your best score.

question 01 / 03

What signal does least connections use to choose a server?

question 02 / 03

Why must the active-connection counter be incremented at the moment of the routing decision, not when the request reaches the backend?

question 03 / 03

Why is least connections harder to get right across multiple independent load balancers?

0/3 answered