Peak EWMA load balancer simulator with a live chart of each server's exponentially weighted moving average of response time. The load balancer picks the server with the lowest score, computed as the EWMA times active-connections-plus-one. An alpha slider controls how quickly the average reacts to new measurements.

Client Incoming Load balancer Peak EWMA Next: S1 Server 1 · ×1 Active 0 · EWMA Server 2 · ×1 Active 0 · EWMA Server 3 · ×3 Active 0 · EWMA Server 4 · ×1 Active 0 · EWMA
EWMA per server over time — what the algorithm sees when deciding
10s 8s 6s 4s 2s 0s → time (last ~15s)
Server 1 Server 2 Server 3 Server 4
Pick score = EWMA × (active + 1) — lowest wins
S1 1.50s* × (0+1) = 1.50
S2 1.50s* × (0+1) = 1.50
S3 1.50s* × (0+1) = 1.50
S4 1.50s* × (0+1) = 1.50
* default EWMA used until the server's first request completes
S1 latency
×1
S2 latency
×1
S3 latency
×3
S4 latency
×1
α (EWMA smoothing factor) 0.30
low α → smooth EWMA, slow to react · high α → reactive but jumpy
500 ms
Completed
0
In progress
0
S1·S2·S3·S4 done
0·0·0·0
EWMA half-life
~2