Beginner9 min readlive prototype

Weighted Round Robin

Round robin with capacity: give the big servers a higher weight and they take proportionally more turns.

Overview

What this concept solves

Real fleets aren't uniform. You have an 8-core box next to a 2-core box, or a new generation instance sharing the pool with last year's. Plain round robin sends them the same number of requests, so the small server saturates while the big one coasts. Weighted round robin fixes that by giving each server a weight proportional to its capacity, then handing out requests in that proportion.

A server with weight 3 receives three requests for every one a weight-1 server gets. Over a full cycle the split matches the weight ratios exactly — it's still the deterministic, stateless deal-out of round robin, just with some servers dealt more cards per round.

Mechanics

How it works

Expand the weights into a cycle

The easiest mental model — and what the prototype shows in its 'Cycle pattern' row — is to expand the weights into a sequence. With weights S1=3, S2=1, S3=2, S4=1 the round is:

text
S1 → S1 → S1 → S2 → S3 → S3 → S4  ↺

Seven slots, then it repeats. Over each cycle S1 gets 3/7 of traffic, S3 gets 2/7, and S2 and S4 get 1/7 each — exactly their weight shares. The naïve implementation literally walks this expanded list.

Smooth weighted round robin

The naïve expansion is bursty: weight-3 S1 gets all three of its requests back-to-back at the top of every cycle. Production balancers (NGINX, Envoy) use smooth weighted round robin instead, which interleaves the picks — e.g. S1, S3, S1, S2, S1, S3, S4 — so a heavy server's requests are spread through the round rather than clumped. Same totals per cycle, gentler instantaneous load.

Weight is capacity, not load

Weights are a static statement about how much each server can handle, set by you up front. They say nothing about how busy a server is right now. If a heavy-weighted server happens to catch a run of expensive requests, weighted round robin keeps feeding it its full share anyway — it can't see the imbalance.

Interactive prototype

Run it. Break it. Tune it.

Sandboxed simulation embedded right in the page. No setup, no install.

About this simulation

Round robin, but each server carries a weight. The cycle pattern shows the resulting sequence — a weight-3 server appears three times per round, a weight-1 server once. Adjust the weight steppers and watch the pattern (and the actual-vs-target share) re-balance.

Hands-on

Try these on your own

Open the prototype above, run each experiment, predict the answer, then verify.

try 01

Read the cycle pattern

With the default weights (3, 1, 2, 1) the 'Cycle pattern' row reads S1 → S1 → S1 → S2 → S3 → S3 → S4. Send seven requests and confirm the handled counts land at 3 / 1 / 2 / 1 — one full cycle, split exactly by weight.

try 02

Re-weight a server

Bump S2's weight from 1 up to 5 with its stepper. The cycle pattern instantly grows S2's slots and the connection line to S2 thickens. Hit 'Start auto' and watch the 'Actual vs target share' stat converge — S2 now pulls the largest slice.

try 03

Build a canary split

Set three servers to weight 1 and one server to weight 9. That lone heavy server now takes ~75% of traffic — or flip it: weight 1 on a 'new version' against 9s elsewhere gives it ~3% for a cautious canary. Weights are your traffic-splitting dial.

In practice

When to use it — and what you give up

When to reach for it

  • Heterogeneous hardware — mix of instance sizes or generations; weight each by relative capacity (cores, RAM, benchmarked throughput).
  • Canary / gradual rollout — give a new version weight 1 against the stable pool's weight 10 to send it ~10% of traffic, then ramp the weight up.
  • Predictable request cost — like plain round robin, it shines when requests are uniform; the weights handle capacity, not request variance.
  • Draining a node — drop a server's weight toward zero to bleed traffic off it before maintenance.

Real-world example

NGINX exposes this directly: server backend1 weight=3;. Envoy and HAProxy both implement smooth weighted round robin so a high-weight backend doesn't receive its whole allocation in one burst each cycle.

Pros

  • Handles uneven server capacity that plain round robin can't.
  • Still stateless and O(1) per pick — no live monitoring of backends.
  • Doubles as a traffic-splitting knob for canaries and gradual rollouts.
  • Deterministic and easy to reason about: shares match weight ratios exactly per cycle.

Cons

  • Weights are static — they encode capacity, not the live load, so a heavy server catching expensive requests still gets overfed.
  • Someone has to set the weights correctly, and re-tune them when the fleet changes.
  • Naïve expansion is bursty for high-weight servers; you want a smooth-WRR implementation.
  • Still blind to request duration and server health, just like round robin.

Reference

Code & further reading

A minimal reference implementation and pointers worth bookmarking.

weighted-round-robin.ts
// Smooth Weighted Round Robin (the algorithm NGINX/Envoy use).
// Avoids the bursty "all of S1, then S2..." of naive expansion.
type Server = { id: string; weight: number; current: number };

class SmoothWRR {
  private servers: Server[];
  constructor(pool: { id: string; weight: number }[]) {
    this.servers = pool.map((s) => ({ ...s, current: 0 }));
  }

  pick(): string {
    const total = this.servers.reduce((sum, s) => sum + s.weight, 0);
    let best: Server | null = null;
    for (const s of this.servers) {
      s.current += s.weight;            // each gains its weight
      if (!best || s.current > best.current) best = s;
    }
    best!.current -= total;             // the winner pays the full total back
    return best!.id;
  }
}

// weights 3,1,2,1 -> a smooth, interleaved sequence like
// s1, s3, s1, s2, s1, s3, s4  (still 3:1:2:1 per cycle)
const lb = new SmoothWRR([
  { id: "s1", weight: 3 },
  { id: "s2", weight: 1 },
  { id: "s3", weight: 2 },
  { id: "s4", weight: 1 },
]);

References & further reading

6 sources

Knowledge check

Did the prototype land?

Quick questions, answers revealed on submit. No scoring saved.

question 01 / 03

Servers have weights S1=3, S2=1, S3=2, S4=1. Over one full cycle, what fraction of requests does S3 receive?

question 02 / 03

What does a server's weight represent?

question 03 / 03

Why do NGINX and Envoy use *smooth* weighted round robin instead of naively expanding the weights?

0/3 answered

Was this concept helpful?

Tell us what worked, or what to improve. We read every note.