The Elo rating system, developed by physics professor Arpad Elo for the US Chess Federation in 1960, solved a fundamental problem in competitive ranking: how to measure relative skill between players who do not all play each other. By modeling win probability as a logistic function of rating difference and updating ratings after every game based on actual vs. expected outcomes, Elo created a system that continuously adjusts to new information without requiring a single round-robin tournament. The elegance and accuracy of Elo's original design has made it the foundation for rating systems in chess, Go, online gaming, team sports, and even academic peer review.
The Mathematics of Expected Score
The expected score formula E = 1/(1 + 10^((R_opp − R_me)/400)) maps rating differences to win probabilities using a logistic curve. The scale factor of 400 is a convention: Arpad Elo chose it so that a 400-point rating difference corresponds to a 10:1 expected score ratio — meaning the higher-rated player should score approximately 10 points for every 1 point the lower-rated player scores. At 0 rating difference, E = 0.5 (50% win probability). At +200 points, E ≈ 0.76; at +400, E ≈ 0.91; at +600, E ≈ 0.97. The logistic shape reflects the empirical observation that skill differences in chess produce approximately this win probability distribution. The original Elo formula used a normal distribution rather than logistic, but the logistic version (adopted by FIDE in 1978) is mathematically equivalent for most purposes and easier to compute.
K-Factor: Stability vs. Responsiveness
The K-factor controls the volatility of the rating system. A high K allows ratings to change rapidly, which is appropriate when a player's true strength is uncertain (new players, rapidly improving juniors) — the rating system needs to 'find' the right level quickly. A low K produces stable ratings that change slowly, appropriate for established elite players whose true strength is known with high confidence. FIDE uses three K-factor tiers: K=40 for players in their first 30 games or below 2300, K=20 for established players (2100–2400), K=10 for elite players above 2400 with lifetime game experience. Online chess platforms like Chess.com and Lichess often use higher K-factors (32–64) because online games are faster and more numerous, allowing faster convergence to true strength. The consequence of a high K-factor is that a single upset can move ratings by 30–40 points; a low K means even a major upset moves ratings by only 8–10 points. Both are correct for their intended use cases.
Elo Beyond Chess: Team Sports and Online Games
Elo's elegance — requiring only win/draw/loss outcomes and current ratings — made it ideal for adoption beyond chess. FIFA uses an Elo-variant for national team rankings that assigns match weights by tournament importance and adjusts for goal difference. The NFL has used Elo-based power ratings for forecasting since the 1980s, and FiveThirtyEight's NFL Elo model achieved notable prediction accuracy. Video game multiplayer matchmaking systems in League of Legends, Dota 2, and most competitive games use Elo or TrueSkill (a Bayesian extension that handles team games and accounts for uncertainty). The key insight that made Elo extensible: any zero-sum game with binary outcomes (win/loss) or ordered outcomes (win/draw/loss) can be modeled with the same expected-score framework, with the K-factor and scale constant tuned to the specific game's competitive environment. The main limitation of classic Elo for team sports is that it treats teams as single players — individual performance variance is averaged out, and roster changes require manual adjustment.