A somewhat surprising Top 100 hundred game list, based on BGG ratings… AND SCIENCE!
- Alex WilsonCanada
OntarioGenerally speaking, things have gone about as far as they can possibly go, when things have gotten about as bad as they can reasonably get.
The rankings listed here are based solely off the BGG rating data, but they have nothing to do with the average or Bayesian average or ratings, or combining the average rating with the number of ratings, etc. They are based on the relative rankings of games by each user. For every user’s list of rated, we can infer which games they think are better or worse than other games. From this, we can look at all the possible pairings of games and find the "most preferred games". This is not a list of games determined by just mass popularity or rating, but by how people compared a game to other games they have played.
I think the results might be a little surprising -- and interesting! But before we get to the results, let me show a small example of why there can be so much more to rankings than just averages of ratings...
Let’s look at a tiny subset of games, Agricola, Brass, Caylus, and Dominion. 5 imaginary gamers rank them, relative to each other, from best to worst, with their ratings shown in brackets:
Ellie: Agricola (9) > Brass (6) > Caylus (5) > Dominion (4)
Fred: Dominion (9) > Brass (7) > Caylus (6) > Agricola (5)
Giles: Agricola (9) > Brass (6) > Dominion (5) > Caylus (2)
Hank: Brass (9) > Caylus (8) > Agricola (7) > Dominion (5)
June: Brass (8) > Agricola (7) > Caylus (6) > Dominion (5)
According to this group, which is the best game? Using this system (it’s the Schulze method, for the curious), the answer is Brass. All of them think it is better than Caylus, three-fifths think it is better than Agricola, and four-fifths think it is better than Dominion. We did use the rating to determine the order of preference, but after that, the number is not needed.
But if we went by the ratings, the averages would have been A (7.4), B (7.2), C (5.4), D (5.6) -- Agricola wins even though 60% of the gamers like Brass better than it. By looking at the relative "do I like game X better than game Y" for each gamer instead of the ratings themselves, we got something that is a bit more telling about how the games relate to each other.
The above scenario is a great application of how a bunch of people sitting around a table might decide what game they would like to play. But wouldn’t it be interesting if we could do that on a massive scale, like having all the BGGers sitting around the same giant table and vote on all their favourite games? Well, in a sense, we can -- we can just order the ratings of users already in the BGG database to get their relative preferences.
Since the system is always comparing pairs of games, it’s only looking at the scoring by users who have rated both of the games in question. This leads to a strong transitive property -- if a majority of the gamers think A is better than B, and B is better than C, the system will rank A better than C, even if there are few users directly comparing A and C and the averages work out a different way.
The transitive property really shows up with games that that might have a smaller number of ratings, but do consistently better than "popular" titles -- which I believe is why there are a good number of war-games on the list. Or, it might be that the war-games just have a more consistent and agreed "X is better than Y", which strengthens their results. I’m not sure.
The data used to build this list consisted of almost 2 million current ratings for nearly a thousand games, resulting in over 800,000 pairings. From that, the preferences are calculated, and the most preferred game is found. We remove it from the pool, and calculate preferences again to find the second game, and so on. The algorithm’s speed is based on the cube of the number of games being compared, so it gets into taking a few days or longer running on the full data set. In order to speed it up, my solution was to randomly pick smaller subsets of the data, and take the top few to build a pool of 200 games to be compared for the final run-off. More specific details about the data collection and the ranking method are in the comments.
Games with high averages will usually do pretty well since they are more likely to beat other games in their one-to-one pairings of preference, but the resulting list is much different than one just sorted by average or Bayesian average.
Is this supposed to be the definitive list of "what is the best game?" I don’t think so, but I was pretty surprised at a number of the titles that appeared -- not the usual suspects on most Top 100 lists, but now I will be giving a lot of them a closer look. It’s fun to dive into the data and come up with something unexpected.
Are you surprised by the games on the list? Are they hidden gems or niche games?
I've started made a blog post (in what will hopefully be a series) about mining and looking at BGG data:
- [+] Dice rolls