Overview
What this concept solves
Least connections has one blind spot: it treats every server as equal. Route to 'fewest connections' and a tiny 2-core box looks just as available as an 8-core monster when both show 4 active connections — but 4 connections means very different things to those two machines. Weighted least connections closes that gap by dividing each server's active count by its capacity weight and routing to the lowest ratio.
It is the most adaptive algorithm in this set because it combines both signals: capacity (the static weight, like weighted round robin) and live load (the active count, like least connections). A big server can hold proportionally more connections before it's considered 'as loaded' as a small one — so the fleet reaches saturation together.
Mechanics
How it works
Minimize active ÷ weight
Instead of comparing raw connection counts, compare load ratios:
loadRatio(i) = active[i] / weight[i]
target = argmin_i loadRatio(i)A weight-3 server at 3 active connections has ratio 1.0; a weight-1 server at 1 active connection also has ratio 1.0 — they're equally loaded relative to capacity, even though one holds three times the connections. New work goes to whoever has the lowest ratio, so the big servers fill up proportionally faster in absolute terms while every server's relative load stays even.
When all weights are equal, the division is by a constant and the algorithm collapses back to plain least connections. Weighted least connections is the strict generalization — least connections is just the all-weights-equal case.
Two signals, one number
Think of weight as the size of the bucket and active connections as how full it is. The ratio is the fill level. Routing to the emptiest bucket — not the one with the fewest litres — is what lets a mixed fleet drain evenly.
Interactive prototype
Run it. Break it. Tune it.
Sandboxed simulation embedded right in the page. No setup, no install.
About this simulation
Least connections that finally accounts for capacity. Each server has a weight, and the balancer routes by the lowest load ratio = active ÷ weight, not raw active count. The ratios row shows the live comparison. Adjust weights and watch a weight-3 server comfortably carry three times the connections of a weight-1 server.
Hands-on
Try these on your own
Open the prototype above, run each experiment, predict the answer, then verify.
Read the load ratios
Watch the 'Load ratios (active ÷ weight)' row while you hit 'Auto'. The balancer always targets the server showing the lowest ratio — not the lowest raw active count. With default weights 3, 1, 2, 1, the weight-3 server will sit at several active connections while a weight-1 server sits at one, yet their ratios stay close.
Prove the proportional fill
Click 'Send 10' and let it settle. Compare the 'Active' counts to the weights: the weight-3 server should hover around three times the active connections of a weight-1 server. That's capacity-proportional balancing — the big box carries proportionally more, and the 'Load spread' stat stays tight.
Collapse it to least connections
Set every weight to the same value (say all 2) with the steppers. The load ratio becomes active÷constant for everyone, so the algorithm now behaves identically to plain Least Connections — routing purely by raw active count. Weighted least connections is just least connections with a capacity divisor.
In practice
When to use it — and what you give up
When to reach for it
- *Mixed fleet and variable request cost* — the combination that defeats every simpler algorithm. Big and small servers, short and long requests, all at once.
- Gradual migrations — running new, larger instances alongside old ones; weights keep both at proportional load while connection-awareness handles request variance.
- Autoscaling groups with instance diversity — spot/on-demand mixes or multi-generation pools where capacities genuinely differ.
- Anywhere you'd want least connections but your servers aren't identical — which, in practice, is most real fleets.
Real-world example
HAProxy's leastconn honors per-server weight, and NGINX combines least_conn with weight= directives — both implement exactly this ratio-based routing. It's the standard choice for L4 balancing across heterogeneous backends.
Pros
- The most adaptive of the set — accounts for both capacity and live load simultaneously.
- Generalizes least connections cleanly: equal weights reduce to plain least-conn.
- Keeps a mixed fleet at even relative utilization, so all servers saturate together.
- Handles uneven request durations and uneven server sizes in one rule.
Cons
- Inherits every distributed-state hazard of least connections — independent balancers see only their own connections.
- Two things to get right now: accurate weights and accurate live counts.
- Connection count is still an imperfect load proxy; many cheap connections can mislead the ratio.
- Weights are static and need re-tuning when the fleet's composition changes.
Reference
Code & further reading
A minimal reference implementation and pointers worth bookmarking.
// Weighted least connections: minimize active / weight.
type Backend = { id: string; weight: number; active: number };
class WeightedLeastConnBalancer {
private backends: Backend[];
constructor(pool: { id: string; weight: number }[]) {
this.backends = pool.map((b) => ({ ...b, active: 0 }));
}
acquire(): Backend {
let best = this.backends[0];
let bestRatio = best.active / best.weight;
for (const b of this.backends) {
const ratio = b.active / b.weight; // load relative to capacity
if (ratio < bestRatio) { best = b; bestRatio = ratio; }
}
best.active++; // reserve eagerly, at decision time
return best;
}
release(b: Backend): void {
b.active--;
}
}
// With equal weights this reduces exactly to plain least connections.
const lb = new WeightedLeastConnBalancer([
{ id: "s1", weight: 3 },
{ id: "s2", weight: 1 },
{ id: "s3", weight: 2 },
{ id: "s4", weight: 1 },
]);References & further reading
6 sources- Docsenvoyproxy.io
Envoy — Load balancers (weighted least request)
Envoy's 'least request' LB and exactly how it folds weights into active-request accounting — the closest production analog to this prototype.
- Docshaproxy.com
HAProxy — Load balancing algorithms (leastconn + weight)
How HAProxy folds per-server
weightinto its least-connections decision for mixed-capacity backends. - Docsdocs.nginx.com
NGINX — least_conn with weighted servers
Combining the
least_connmethod withweight=so big and small servers fill proportionally. - Articlecloudflare.com
Cloudflare — Types of load balancing algorithms
Defines weighted least connection plainly: weights assume some servers handle more connections than others.
- Booksre.google
Google SRE Book — Ch. 20: Load Balancing in the Datacenter
Production view of combining capacity weights with live utilization to keep a heterogeneous fleet evenly loaded.
- Articleen.wikipedia.org
Wikipedia — Load balancing (computing)
Where weighted least connections sits among the dynamic, state-aware strategies.
Knowledge check
Did the prototype land?
Quick questions, answers revealed on submit. No scoring saved.
question 01 / 03
Server A has weight 3 and 3 active connections. Server B has weight 1 and 1 active connection. Which does weighted least connections consider less loaded?
question 02 / 03
What two signals does weighted least connections combine?
question 03 / 03
What does weighted least connections reduce to when all server weights are equal?
0/3 answered
Was this concept helpful?
Tell us what worked, or what to improve. We read every note.