In all of the tournament result tables, you’ll see a rating for each player before the tournament and after the tournament. By far the most common questions I get from the kids are: “What’s my rating” and some version of “How many points will I gain/lose today?”
If you are also wondering these things, this post is for you.
Why have a rating system?
The first thing you might wonder, is “why bother?” The answer to this is simple, if I track the kids’ ratings, they’ll get more fun, more competitive games, because I can pair them against other players with similar ratings. Which would otherwise be very difficult to do manually because kids come and go, kids improve (or go on streaks where they try dubious strategies and lose a lot of games) and by keeping track of their ratings, I don’t have to have in my head an idea of where everyone is in terms of strength and try to manage the pairings myself.
The Rookery has a rating system that is based on the Elo rating system. Commonly (and mistakenly) believed to be an acronym E.L.O. But which is in fact named after a man named Arpad Elo. The goal of The Rookery system, and the Elo system, is to estimate the current strength of a player in a zero sum game wherein a win for one player necessarily means a loss for another - chess is a zero sum game; even a draw results in the players dividing the available 1 point equally.
For a system to do this effectively, it must not be “game-able” essentially, players should not be able to artificially inflate, or deflate their rating. It should reflect ONLY their relative skill to other players in the rating system.
It must also be sensitive to the changing strength of players. After all, our players are constantly improving, so their ratings should reflect that.
An additional bonus feature is that it should not disincentivize strong players from playing weaker players. If players believe that playing someone rated much lower than them poses a disproportionate risk to their rating they might not want to play them. This highlights the misconception that rating is something other than a measure of one’s strength, and that it won’t come out in the wash in the end, but this is a VERY common anxiety among players.
Ratings should not be coddled, protected, or worried about. If you are a player who’s TRUE strength is 800. You’ll win 50% of your games against other players rated 800. You’ll win most games against players rated 400, and you’ll lose most games against players rated 1200, and in the end, your rating will stay near 800.
The ONLY way to durably change your rating is for your TRUE playing strength to improve relative to the other players in the system. You must work harder, practice more, and play people who are stronger than you.
It is intentionally not possible to “farm” weak players for rating points. The further below your own strength an opponent is, the fewer points you will win from them for winning a single game. This is an intentional design feature of the system.
System Design
Established ratings
We are going to assume for now, that both players in a given game are rated. How new players get their rating in the first place will be discussed in a later section. The first step, is to determine who “should win” the match.
The estimated win probability of player A is determined as shown above by finding what proportion of the total rating points player A brings to the match. The same formula is used to find the estimated win probability of player B by putting player B’s rating in the numerator. These two probabilities will always add up to 1. If 1 point is awarded for a win, 0 for a loss, and 0.5 for a draw, this variable can also be thought of as the expected number of points a player will earn per game.
In order to determine how much a player’s rating should be adjusted, we compare this expected number of points to the ACTUAL number of points earned in the game.
This is the number of points in excess of the expected points, a player earned (Interestingly, this will always yield the expected number of points for the other player.) And we could then add this number to the player’s pre-game rating.
This means that the difference between a player’s post game rating, and their pre-game rating wil be directly proportional to the error in the EXPECTED outcome.
But this solution is not very responsive, if tournaments occur once a week, and there are three rounds in a tournament, a player could only gain around 3 points per week. In many cases much less. But their skill could realistically change much faster than that. So, we scale this difference up by a “K-factor” a commonly used K-factor in chess, and the one that The Rookery uses is 16.
With this formula, if players A and B were relatively evenly matched, their expected score would be 0.5. They’d receive 8 points for wins, and lose 8 points for losses, and draws would see nearly 0 points exchange hands. And so, over a large number of games, assuming they remained evenly matched, their ratings would stay very near one another.
One last adjustment is necessary though. With these mathematical pieces we’ve described so far, ratings would cover an extremely large range, and it would take a very long time for a given players’ rating to converge on their “true strength.” So a logarithmic transform is used. This results in a system wherein every 400 rating points represents a ten-fold increase in player strength.
And finally, the system of equations that governs established ratings is below:
Where RA = Player A’s pre-game rating, RA’ = Player A’s post-game rating, EA = Player A’s Expected points, SA = Player A’s actual points, QA = Player A’s linearized pre-game rating.
So if someone asks how ratings are calculated, or what your rating is, or how they can find out their own rating, now you can tell them!
Provisional ratings
So now we come to the part where we discuss what happens if a player does not already have a rating. There are a number of things we do to estimate a players’ rating and I’ll start with the simplest:
If a player has a United States Chess Federation (USCF) regular rating, that rating is used as an established Rookery rating. This helps everyone covered by my rating system because it will cause all the ratings to converge on USCF ratings which makes them more comparable.
If a player does not have a USCF rating, they’re assigned the average rating in the club for the purpose of pairing in their first game, but this average has no further impact on their rating. After their first game, their provisional rating is calculated as follows:
This is often referred to as a “performance rating” and is used within USCF/FIDE tournaments to assess a player’s performance in a given tournament for the purpose of comparing to their published rating. In The Rookery, I use it to estimate a player’s rating until they’ve accumulated enough games to establish their rating.
In order for a rating to be considered “established” a player must have at least two different results, a win and a loss, a win and a draw, or a draw and a loss. They must also have played at least 10 rated games. Whatever R is when those requirements are met, becomes a player’s established rating.
Some quirks
As you may have noticed, the rating you ended last week doesn’t always match the rating you have this week; or your rating might inexplicably move a bunch of points in a period where you didn’t play any games. The reason for this is that when the USCF rates their tournaments, which are usually not rated right away, and are dependent on a randomly generated tournament director turning in the cross tables, Rookery events that occurred after those previously unrated events are “re-rated” by my algorithm. Essentially, it treats the new information as if it had gotten it immediately after the tournament ended, and rates all the events it knows about in chronological order. So if you played someone who played in a USCF event, their rating might change retroactively as a result, and subsequently *your* rating will change as well.
For example, prior to the tournament on May 6, 2025, both Matt T.W. and I played in a USCF rated online tournament. I won that tournament with 4.5 points out of 5, and then on the 6th, lost a game to Grace. She gained a bunch of points for that win, but after USCF rates the tournament I won, my rating *prior to playing Grace* will go up, and as a result, she will gain *even more* of my points because of the increased rating disparity!
The only way to really know how strong you are, is to keep playing, and be subscribed so that you get the updated ratings as often as possible.
Which games are rated?
Only games you actually play are rated. If you take a look at How to Read a Tournament Result Table, you’ll see that over the course of a tournament, there are a few ways you might miss a round. Maybe you arrive late, need to leave early, take a break for a round, or are asked to sit out because there’s an odd number of players. Rest assured that none of this ever counts against you. Your rating is only affected by games you actually play. Remember, the whole point of your rating is to estimate your true strength as accurately as possible; it is not some kind of score.
Future work
FIDE and USCF both have rating systems that account for “effective games played” which allows the *amount* of data used to estimate a player’s strength to affect how much a player’s rating will change after a given game. Right now, my rating system doesn’t do anything like this. It has a strong “recency bias” which is to say that each game affects a player’s rating in exactly the same way, so a player’s rating is most strongly reflective of games they’ve played recently. This isn’t necessarily a negative feature. However, there is one feature that I’m working on building in which is a “staleness” measure.
Sometimes, players don’t play rated games for extended periods. Maybe they get busy, and they don’t practice and their strength goes down relative to their rating, so when they come back, they’re “over-rated.” On the flip side of that coin, maybe they’re unhappy with a recent performance, and go away for a while. They study hard, do puzzles every single day, get a coach etc. Then they come back to rated play and their strength is substantially higher than their rating suggests. Both of these situations would be well served if a player’s rating, after a long absence was more sensitive to new data. A “stale” rating would move more than the rating of a player who comes every single week.
I have some ideas, and when I’m happy with them, they’ll probably be implemented in my system. Stay tuned to hear about that when it’s finished…
A request
A small favor, if you know someone who would enjoy playing with us either online (The Rookery) or in person (Chelsea District Library) let them know about us. Maybe gift them a subscription to this substack so they can get news about the club and results from tournaments. It has come to my attention that A LOT of people still don’t manage to get timely information about what’s going on. I’d love it if you could help me with that!