Skip to content

Commit 24f6f34

Browse files
Added ELO Rating System Doc (Unity-Technologies#4685)
* *added ELO rating system doc *added some illustrations for ELO rating system doc (elo_example, elo_expected_score_formula, elo_score_update_formula) * Update ELO-Rating-System.md * Remove blank space * Run pre-commit * Remove tennis illustration link * Update ELO-Rating-System.md * Add ELO link * Pre-commit
1 parent 0696876 commit 24f6f34

File tree

5 files changed

+62
-1
lines changed

5 files changed

+62
-1
lines changed

docs/ELO-Rating-System.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# ELO Rating System
2+
In adversarial games, the cumulative environment reward may **not be a meaningful metric** by which to track
3+
learning progress.
4+
5+
This is because the cumulative reward is **entirely dependent on the skill of the opponent**.
6+
7+
An agent at a particular skill level will get more or less reward against a worse or better agent,
8+
respectively.
9+
10+
Instead, it's better to use ELO rating system, a method to calculate **the relative skill level between two players in a zero-sum game**.
11+
12+
If the training performs correctly, **this value should steadily increase**.
13+
14+
## What is a zero-sum game?
15+
A zero-sum game is a game where **each player's gain or loss of utility is exactly balanced by the gain or loss of the utility of the opponent**.
16+
17+
Simply explained, we face a zero-sum game **when one agent gets +1.0, its opponent gets -1.0 reward**.
18+
19+
For instance, Tennis is a zero-sum game: if you win the point you get +1.0 and your opponent gets -1.0 reward.
20+
21+
## How works the ELO Rating System
22+
- Each player **has an initial ELO score**. It's defined in the `initial_elo` trainer config hyperparameter.
23+
24+
- The **difference in rating between the two players** serves as the predictor of the outcomes of a match.
25+
26+
![Example Elo](images/elo_example.png)
27+
*For instance, if player A has an Elo score of 2100 and player B has an ELO score of 1800 the chance that player A wins is 85% against 15% for player b.*
28+
29+
- We calculate the **expected score of each player** using this formula:
30+
31+
![Elo Expected Score Formula](images/elo_expected_score_formula.png)
32+
33+
- At the end of the game, based on the outcome **we update the player’s actual Elo score**, we use a linear adjustment proportional to the amount by which the player over-performed or under-performed.
34+
The winning player takes points from the losing one:
35+
- If the *higher-rated player wins***a few points** will be taken from the lower-rated player.
36+
- If the *lower-rated player wins***a lot of points** will be taken from the high-rated player.
37+
- If it’s *a draw* → the lower-rated player gains **a few points** from higher.
38+
39+
- We update players rating using this formula:
40+
41+
![Elo Score Update Formula](images/elo_score_update_formula.png)
42+
43+
### The Tennis example
44+
45+
- We start to train our agents.
46+
- Both of them have the same skills. So ELO score for each of them that we defined using parameter `initial_elo = 1200.0`.
47+
48+
We calculate the expected score E:
49+
Ea = 0.5
50+
Eb = 0.5
51+
52+
So it means that each player has 50% chances of winning the point.
53+
54+
If A wins, the new rating R would be:
55+
56+
Ra = 1200 + 16 * (1 - 0.5) → 1208
57+
58+
Rb = 1200 + 16 * (0 - 0.5) → 1192
59+
60+
Player A has now an ELO score of 1208 and Player B an ELO score of 1192. Therefore, Player A is now a little bit **better than Player B**.

docs/ML-Agents-Overview.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -592,7 +592,8 @@ See our
592592
page for more information on setting up teams in your Unity scene. Also, read
593593
our
594594
[blog post on self-play](https://blogs.unity3d.com/2020/02/28/training-intelligent-adversaries-using-self-play-with-ml-agents/)
595-
for additional information.
595+
for additional information. Additionally, check [ELO Rating System](ELO-Rating-System.md) the method we use to calculate
596+
the relative skill level between two players.
596597

597598
### Training In Cooperative Multi-Agent Environments with MA-POCA
598599

docs/images/elo_example.png

76.9 KB
Loading
103 KB
Loading
122 KB
Loading

0 commit comments

Comments
 (0)