|
| 1 | +# ELO Rating System |
| 2 | +In adversarial games, the cumulative environment reward may **not be a meaningful metric** by which to track |
| 3 | +learning progress. |
| 4 | + |
| 5 | +This is because the cumulative reward is **entirely dependent on the skill of the opponent**. |
| 6 | + |
| 7 | +An agent at a particular skill level will get more or less reward against a worse or better agent, |
| 8 | +respectively. |
| 9 | + |
| 10 | +Instead, it's better to use ELO rating system, a method to calculate **the relative skill level between two players in a zero-sum game**. |
| 11 | + |
| 12 | +If the training performs correctly, **this value should steadily increase**. |
| 13 | + |
| 14 | +## What is a zero-sum game? |
| 15 | +A zero-sum game is a game where **each player's gain or loss of utility is exactly balanced by the gain or loss of the utility of the opponent**. |
| 16 | + |
| 17 | +Simply explained, we face a zero-sum game **when one agent gets +1.0, its opponent gets -1.0 reward**. |
| 18 | + |
| 19 | +For instance, Tennis is a zero-sum game: if you win the point you get +1.0 and your opponent gets -1.0 reward. |
| 20 | + |
| 21 | +## How works the ELO Rating System |
| 22 | +- Each player **has an initial ELO score**. It's defined in the `initial_elo` trainer config hyperparameter. |
| 23 | + |
| 24 | +- The **difference in rating between the two players** serves as the predictor of the outcomes of a match. |
| 25 | + |
| 26 | + |
| 27 | +*For instance, if player A has an Elo score of 2100 and player B has an ELO score of 1800 the chance that player A wins is 85% against 15% for player b.* |
| 28 | + |
| 29 | +- We calculate the **expected score of each player** using this formula: |
| 30 | + |
| 31 | + |
| 32 | + |
| 33 | +- At the end of the game, based on the outcome **we update the player’s actual Elo score**, we use a linear adjustment proportional to the amount by which the player over-performed or under-performed. |
| 34 | +The winning player takes points from the losing one: |
| 35 | + - If the *higher-rated player wins* → **a few points** will be taken from the lower-rated player. |
| 36 | + - If the *lower-rated player wins* → **a lot of points** will be taken from the high-rated player. |
| 37 | + - If it’s *a draw* → the lower-rated player gains **a few points** from higher. |
| 38 | + |
| 39 | +- We update players rating using this formula: |
| 40 | + |
| 41 | + |
| 42 | + |
| 43 | +### The Tennis example |
| 44 | + |
| 45 | +- We start to train our agents. |
| 46 | +- Both of them have the same skills. So ELO score for each of them that we defined using parameter `initial_elo = 1200.0`. |
| 47 | + |
| 48 | +We calculate the expected score E: |
| 49 | +Ea = 0.5 |
| 50 | +Eb = 0.5 |
| 51 | + |
| 52 | +So it means that each player has 50% chances of winning the point. |
| 53 | + |
| 54 | +If A wins, the new rating R would be: |
| 55 | + |
| 56 | +Ra = 1200 + 16 * (1 - 0.5) → 1208 |
| 57 | + |
| 58 | +Rb = 1200 + 16 * (0 - 0.5) → 1192 |
| 59 | + |
| 60 | +Player A has now an ELO score of 1208 and Player B an ELO score of 1192. Therefore, Player A is now a little bit **better than Player B**. |
0 commit comments