Weighted Least Connections — Load Balancing

Overview

What this concept solves

Least connections has one blind spot: it treats every server as equal. Route to 'fewest connections' and a tiny 2-core box looks just as available as an 8-core monster when both show 4 active connections — but 4 connections means very different things to those two machines. Weighted least connections closes that gap by dividing each server's active count by its capacity weight and routing to the lowest ratio.

It is the most adaptive algorithm in this set because it combines both signals: capacity (the static weight, like weighted round robin) and live load (the active count, like least connections). A big server can hold proportionally more connections before it's considered 'as loaded' as a small one — so the fleet reaches saturation together.

Mechanics

How it works

Minimize active ÷ weight

Instead of comparing raw connection counts, compare load ratios:

text

loadRatio(i) = active[i] / weight[i]
target = argmin_i  loadRatio(i)

A weight-3 server at 3 active connections has ratio 1.0; a weight-1 server at 1 active connection also has ratio 1.0 — they're equally loaded relative to capacity, even though one holds three times the connections. New work goes to whoever has the lowest ratio, so the big servers fill up proportionally faster in absolute terms while every server's relative load stays even.

When all weights are equal, the division is by a constant and the algorithm collapses back to plain least connections. Weighted least connections is the strict generalization — least connections is just the all-weights-equal case.

Two signals, one number

Think of weight as the size of the bucket and active connections as how full it is. The ratio is the fill level. Routing to the emptiest bucket — not the one with the fewest litres — is what lets a mixed fleet drain evenly.

Interactive prototype

Run it. Break it. Tune it.

Sandboxed simulation embedded right in the page. No setup, no install.

simulation › Weighted Least Connections

About this simulation

Least connections that finally accounts for capacity. Each server has a weight, and the balancer routes by the lowest load ratio = active ÷ weight, not raw active count. The ratios row shows the live comparison. Adjust weights and watch a weight-3 server comfortably carry three times the connections of a weight-1 server.

Hands-on

Try these on your own

Open the prototype above, run each experiment, predict the answer, then verify.

try 01

Read the load ratios

Watch the 'Load ratios (active ÷ weight)' row while you hit 'Auto'. The balancer always targets the server showing the lowest ratio — not the lowest raw active count. With default weights 3, 1, 2, 1, the weight-3 server will sit at several active connections while a weight-1 server sits at one, yet their ratios stay close.

try 02

Prove the proportional fill

Click 'Send 10' and let it settle. Compare the 'Active' counts to the weights: the weight-3 server should hover around three times the active connections of a weight-1 server. That's capacity-proportional balancing — the big box carries proportionally more, and the 'Load spread' stat stays tight.

try 03

Collapse it to least connections

Set every weight to the same value (say all 2) with the steppers. The load ratio becomes active÷constant for everyone, so the algorithm now behaves identically to plain Least Connections — routing purely by raw active count. Weighted least connections is just least connections with a capacity divisor.

In practice

When to use it — and what you give up

When to reach for it

*Mixed fleet and variable request cost* — the combination that defeats every simpler algorithm. Big and small servers, short and long requests, all at once.
Gradual migrations — running new, larger instances alongside old ones; weights keep both at proportional load while connection-awareness handles request variance.
Autoscaling groups with instance diversity — spot/on-demand mixes or multi-generation pools where capacities genuinely differ.
Anywhere you'd want least connections but your servers aren't identical — which, in practice, is most real fleets.

Real-world example

HAProxy's leastconn honors per-server weight, and NGINX combines least_conn with weight= directives — both implement exactly this ratio-based routing. It's the standard choice for L4 balancing across heterogeneous backends.

Pros

The most adaptive of the set — accounts for both capacity and live load simultaneously.
Generalizes least connections cleanly: equal weights reduce to plain least-conn.
Keeps a mixed fleet at even relative utilization, so all servers saturate together.
Handles uneven request durations and uneven server sizes in one rule.

Cons

Inherits every distributed-state hazard of least connections — independent balancers see only their own connections.
Two things to get right now: accurate weights and accurate live counts.
Connection count is still an imperfect load proxy; many cheap connections can mislead the ratio.
Weights are static and need re-tuning when the fleet's composition changes.

Reference

Code & further reading

A minimal reference implementation and pointers worth bookmarking.

weighted_least_connections.go

package lb

// Weighted least connections: minimize active / weight.
type Backend struct {
	id     string
	weight float64
	active float64
}

type WeightedLeastConnBalancer struct {
	backends []*Backend
}

func NewWeightedLeastConnBalancer(pool []*Backend) *WeightedLeastConnBalancer {
	return &WeightedLeastConnBalancer{backends: pool}
}

func (b *WeightedLeastConnBalancer) Acquire() *Backend {
	best := b.backends[0]
	bestRatio := best.active / best.weight
	for _, be := range b.backends {
		ratio := be.active / be.weight // load relative to capacity
		if ratio < bestRatio {
			best, bestRatio = be, ratio
		}
	}
	best.active++ // reserve eagerly, at decision time
	return best
}

func (b *WeightedLeastConnBalancer) Release(be *Backend) {
	be.active--
}

// With equal weights this reduces exactly to plain least connections.
// lb := NewWeightedLeastConnBalancer([]*Backend{
// 	{id: "s1", weight: 3},
// 	{id: "s2", weight: 1},
// 	{id: "s3", weight: 2},
// 	{id: "s4", weight: 1},
// })

References & further reading

6 sources

Knowledge check

Did the prototype land?

Quick questions, answers revealed on submit. Sign in to save your best score.

question 01 / 03

Server A has weight 3 and 3 active connections. Server B has weight 1 and 1 active connection. Which does weighted least connections consider less loaded?

question 02 / 03

What two signals does weighted least connections combine?

question 03 / 03

What does weighted least connections reduce to when all server weights are equal?

0/3 answered