Overview
What this concept solves
Real fleets aren't uniform. You have an 8-core box next to a 2-core box, or a new generation instance sharing the pool with last year's. Plain round robin sends them the same number of requests, so the small server saturates while the big one coasts. Weighted round robin fixes that by giving each server a weight proportional to its capacity, then handing out requests in that proportion.
A server with weight 3 receives three requests for every one a weight-1 server gets. Over a full cycle the split matches the weight ratios exactly — it's still the deterministic, stateless deal-out of round robin, just with some servers dealt more cards per round.
Mechanics
How it works
Expand the weights into a cycle
The easiest mental model — and what the prototype shows in its 'Cycle pattern' row — is to expand the weights into a sequence. With weights S1=3, S2=1, S3=2, S4=1 the round is:
S1 → S1 → S1 → S2 → S3 → S3 → S4 ↺Seven slots, then it repeats. Over each cycle S1 gets 3/7 of traffic, S3 gets 2/7, and S2 and S4 get 1/7 each — exactly their weight shares. The naïve implementation literally walks this expanded list.
Smooth weighted round robin
The naïve expansion is bursty: weight-3 S1 gets all three of its requests back-to-back at the top of every cycle. Production balancers (NGINX, Envoy) use smooth weighted round robin instead, which interleaves the picks — e.g. S1, S3, S1, S2, S1, S3, S4 — so a heavy server's requests are spread through the round rather than clumped. Same totals per cycle, gentler instantaneous load.
Weight is capacity, not load
Weights are a static statement about how much each server can handle, set by you up front. They say nothing about how busy a server is right now. If a heavy-weighted server happens to catch a run of expensive requests, weighted round robin keeps feeding it its full share anyway — it can't see the imbalance.
Interactive prototype
Run it. Break it. Tune it.
Sandboxed simulation embedded right in the page. No setup, no install.
About this simulation
Round robin, but each server carries a weight. The cycle pattern shows the resulting sequence — a weight-3 server appears three times per round, a weight-1 server once. Adjust the weight steppers and watch the pattern (and the actual-vs-target share) re-balance.
Hands-on
Try these on your own
Open the prototype above, run each experiment, predict the answer, then verify.
Read the cycle pattern
With the default weights (3, 1, 2, 1) the 'Cycle pattern' row reads S1 → S1 → S1 → S2 → S3 → S3 → S4. Send seven requests and confirm the handled counts land at 3 / 1 / 2 / 1 — one full cycle, split exactly by weight.
Re-weight a server
Bump S2's weight from 1 up to 5 with its stepper. The cycle pattern instantly grows S2's slots and the connection line to S2 thickens. Hit 'Start auto' and watch the 'Actual vs target share' stat converge — S2 now pulls the largest slice.
Build a canary split
Set three servers to weight 1 and one server to weight 9. That lone heavy server now takes ~75% of traffic — or flip it: weight 1 on a 'new version' against 9s elsewhere gives it ~3% for a cautious canary. Weights are your traffic-splitting dial.
In practice
When to use it — and what you give up
When to reach for it
- Heterogeneous hardware — mix of instance sizes or generations; weight each by relative capacity (cores, RAM, benchmarked throughput).
- Canary / gradual rollout — give a new version weight 1 against the stable pool's weight 10 to send it ~10% of traffic, then ramp the weight up.
- Predictable request cost — like plain round robin, it shines when requests are uniform; the weights handle capacity, not request variance.
- Draining a node — drop a server's weight toward zero to bleed traffic off it before maintenance.
Real-world example
NGINX exposes this directly: server backend1 weight=3;. Envoy and HAProxy both implement smooth weighted round robin so a high-weight backend doesn't receive its whole allocation in one burst each cycle.
Pros
- Handles uneven server capacity that plain round robin can't.
- Still stateless and O(1) per pick — no live monitoring of backends.
- Doubles as a traffic-splitting knob for canaries and gradual rollouts.
- Deterministic and easy to reason about: shares match weight ratios exactly per cycle.
Cons
- Weights are static — they encode capacity, not the live load, so a heavy server catching expensive requests still gets overfed.
- Someone has to set the weights correctly, and re-tune them when the fleet changes.
- Naïve expansion is bursty for high-weight servers; you want a smooth-WRR implementation.
- Still blind to request duration and server health, just like round robin.
Reference
Code & further reading
A minimal reference implementation and pointers worth bookmarking.
// Smooth Weighted Round Robin (the algorithm NGINX/Envoy use).
// Avoids the bursty "all of S1, then S2..." of naive expansion.
type Server = { id: string; weight: number; current: number };
class SmoothWRR {
private servers: Server[];
constructor(pool: { id: string; weight: number }[]) {
this.servers = pool.map((s) => ({ ...s, current: 0 }));
}
pick(): string {
const total = this.servers.reduce((sum, s) => sum + s.weight, 0);
let best: Server | null = null;
for (const s of this.servers) {
s.current += s.weight; // each gains its weight
if (!best || s.current > best.current) best = s;
}
best!.current -= total; // the winner pays the full total back
return best!.id;
}
}
// weights 3,1,2,1 -> a smooth, interleaved sequence like
// s1, s3, s1, s2, s1, s3, s4 (still 3:1:2:1 per cycle)
const lb = new SmoothWRR([
{ id: "s1", weight: 3 },
{ id: "s2", weight: 1 },
{ id: "s3", weight: 2 },
{ id: "s4", weight: 1 },
]);References & further reading
6 sources- Docsdocs.nginx.com
NGINX — Weighted load balancing
The
weight=directive and how NGINX splits traffic by it. - Articlegithub.com
NGINX — smooth weighted round-robin (source commit)
The original explanation of the smooth WRR algorithm and why it interleaves picks instead of expanding weights naively.
- Docsenvoyproxy.io
Envoy — Load balancers (weighted round robin)
Envoy's weighted RR and how it interacts with locality weighting and health.
- Docshaproxy.com
HAProxy — Load balancing algorithms
Per-server
weightunder roundrobin, plus HAProxy's dynamic-weight (agent-check) feature for live re-weighting. - Articlecloudflare.com
Cloudflare — Types of load balancing algorithms
Weighted round robin explained for mixed-capacity fleets, in plain terms.
- Docsdocs.aws.amazon.com
AWS — How Elastic Load Balancing works (weighted target groups)
How ALB/NLB use target-group weights to split traffic — the same idea applied to canaries and blue/green rollouts.
Knowledge check
Did the prototype land?
Quick questions, answers revealed on submit. No scoring saved.
question 01 / 03
Servers have weights S1=3, S2=1, S3=2, S4=1. Over one full cycle, what fraction of requests does S3 receive?
question 02 / 03
What does a server's weight represent?
question 03 / 03
Why do NGINX and Envoy use *smooth* weighted round robin instead of naively expanding the weights?
0/3 answered
Was this concept helpful?
Tell us what worked, or what to improve. We read every note.