Ratings Structure

Though the first competition is not finished yet (and full results are not known until all results are entered). I thought it might interest some to show intermediate values as it is a good place to understand some of the nuances of ELO type systems at first load when the systems does not know anything about the players. Warning  - this will be a long post.

The System

Rating systems have been around for a while now and (most) modern systems are built off of the ELO system first standardized for Chess. The systems are all fairly similar but their intention is to provide a more accurate snapshot of a players skill at a single point in time.  This allows players to see improvement and at the same time for competition organizers to seed competitions to players of a same level. These structures have also been heavily used for competitive online games. Microsoft adjusted the ELO to their 'TrueSkill' system which takes into account games with more than two players and other games like Starcraft/ COD etc all use simlar. The attractiveness of these systems is that it allows match making and thus prevents (hopefully) a less skilled player getting repeatedly hammered by much better players and then deciding to drop.

Glicko

This application runs off of the Glicko2 variant of ELO. The glicko provides three main components to a players rating. The first and usually the most obvious is the Rating. This is a simple number usually from 800 to 2400 ish that indicates the skill level of the player. Glicko also adds a deviation value. This again is a number usually between 50 and 350 and is basically the confidence the system has in the rating. A new player who has not played much or an experienced player whose results are all over the place will have very high deviations whilst if the system is confident that the players rating is accurate then the deviation will shrink. The final component is a volatility factor. This is a fixed number (0.6 in our system) that controls how aggressively ratings can change. If the number is very low then  a players rating change very slowly. If high then a player can shoot up and down the rankings.

Expected Results

All else being equal if you have a fixed number of players in a single competition playing the same fixed number of games then the results should show a bell curve. This is something very common in human society covering such things as height, ability etc. The Bell curve basically says that a small number of people will be very good and a small number of people will be very bad and the majority of people will be massed in the middle.

As an example here is the bell curve chart for the swiss rounds of the current discord league



As you can see the bell curve exists nicely for all the results. It is not a perfect curve though and this is simply due to drops. Since it is the first iteration of the league lots of players started and either found they did not have the time or found they lost..a lot.. so dropped. This means that the system does not have perfect information and the curve is more jagged than it otherwise would be.

I included this because once the stats start rolling though the bell curve will exist for a particular competition the participants will no  longer have identical start points so it will be difficult to map.

Identical Start Results

This discord league is very unusual in that all participants start at the same point. Glicko treats all games in any one competition as having started and being played simultaneously (some other systems recalculate after each round, this incidentally explains why top 16 cuts may differ from glicko as the league software may adjust strength of schedule during the competition). Therefore the league is measuring players it expects to be equal to each other and wins/losses are the only impact on the end factor.

Take the lead player at the end of swiss rounds - BigBellyJarelli, does he have a big belly? who knows but he certainly can play. Here are his base stats at the end of swiss


This player has the highest rating (leaving swiss) of 1847 (and is the only player with a rating this high), in chess terms this is a Class A category 1 player which is only slightly below going to the expert-master etc level. His deviation is a reasonable 166. 

If we look at his matches we can see why


So he basically went undefeated.

This is probably a good time to mention the 95% confidence ratio. This is glickos preferred method of match making. Basically rather than state the simple number i.e 1847 for BigBelly it states the range of values the system is 95% confident that the player belongs to. In BigBellyJarellis case this is [1515-2179] so the system is 95% confident that this player is AT WORSE 1515 and possibly AT BEST 2179. As the deviation decreases the confidence and range also shrinks. 

This is the figure usually for match making in online games. If you take Starcraft 2 as an example when searching for opponents it starts looking using the 95% confidence band. If your band matches any point of another searching players band then you get matched. If it cannot find a match then it decreases the confidence band to say 90% which increases the expected range and thus widens the potential pool of opponents though means you are less likely to get a balanced game but may also see little difference in rating change at the end.

The next player after swiss (in a group at the same level) can be looked at to show the base difference

refbot us a Crane player whose rating of 1731 puts him in the Class B Category 2 level with a deviation of 166. Why the drop of a 100?

because he dropped one game.


If we then look at my own average results as someone in the middle of the pack.

Crane again but with a rating of 1500 and again 166 deviation with me sitting in the Class C, Category 3 level. The deviation will be standard as all three of us described so far reported all six games. Someone who dropped would have a much higher deviation as the system has less information to work with.

The match results show why my rating is exactly the same as my start rating (new players typically get a rating of 1500 and a deviation of 300 to start until the system works out where to put them) In this case  I reported three wins and three losses against opponents also with a rating of 1500 so the system happily leaves me there.

This is a good point to cover drops as the system is only measuring results and not how those results occurred. In my case I am probably actually much lower than this as of my results I 'gave' the first win to my opponent without playing as he had a band of about 3 hours to play all week and I didn't want to get arsey about it. I lost the second but the third opponent disappeared and never got in contact at all so was an auto-win. I was beaten online by the fourth and beat the fifth online. The sixth player could not get to the game (he was apparently playing a pre-arranged destiny 2 game that I presume was a raid that run over ) so I also took that without playing so in terms of actual games played I won 1 and lost 2. Long term it means little as in the next league I am likely to play more as the people who aren't serious about playing will have been whittled out and will probably lose more so my rating will drop.

Finally lets look at someone at the bottom of the ratings. Sadly my only played win.

The unusually named TroutNinja has a rating of 1268 , common deviation of 166 putting him at Class D Category 4. He was honorable enough to not drop until the end even though he was losing a bit.


SO there you have a rough tour of the ranking structure as at the end of swiss. These figures will change as the last 16-finals results come in as those players deviations will drop and their numbers adjust.

Next competition that application will be using the players new ratings as start points and that is where things get more interesting. If a player with a high rating beats a player with a low rating then both players deviations will drop as the result is entirely as expected. The players actual ratings will not change much either as the system will be relatively happy with where they are placed. If the opposite occurs then the deviations will increase as the result is unexpected and the lower ranked winning players rating would increase much higher and the higher ranked losing player would also drop more. 

It will be interesting seeing what happens and how long it takes for Grand Masters to appear.

Finally here is the full list after swiss. I will publish adjusted world rankings after every competition that applies ( competitions have to have a minimum amount of players. Say 40 though I am open to argument either way, the results also have to be provided round by round and eventually it would be nice if players had consistent numbers/ids to track across competitions).




 If anyone has any questions then feel free to pm me on the FFG forums (user Matrim) or on discord - Matrim#6049 or leave a comment here. 

If we have any developers out there who might like to improve the apps appearance or translate for some of the other languages then please shout as well. The application is a web app using angular 2 on a firebase back end and currently runs through VS Code on localhost but could naturally go eventually on a web site though the back end might need to change to something which doesn't cost an arm and a leg first. 

Thanks for reading!

Comments

Popular posts from this blog

Player Types

Weirdly Timed June Results

January 2018 League Ratings