geek
Recently Viewed
Hot Games
Agricola
Dominion
Battlestar Galactica
Settlers of Catan, The
Android
Pandemic
Arkham Horror
Race for the Galaxy
War of the Ring
Le Havre
Carcassonne
Power Grid
Puerto Rico
Axis & Allies Anniversary Edition
Cosmic Encounter
Ghost Stories
Twilight Struggle
Through the Ages: A Story of Civilization
Conflict of Heroes: Awakening the Bear! - Russia 1941-1942
Descent: Journeys in the Dark
StarCraft: The Board Game
Tigris & Euphrates
Stone Age
Combat Commander: Pacific
Apples to Apples
Ticket to Ride
Risk
Talisman 4th Edition
Caylus
Space Alert
Memoir '44
Last Night on Earth: The Zombie Game
Galaxy Trucker
Twilight Imperium 3rd Edition
Brass
StarCraft: Brood War
Lost Cities: The Board Game
BattleLore
El Grande
Bang!
Wasabi!
Shogun
Citadels
Railroad Tycoon
Race for the Galaxy: The Gathering Storm
Clue
Formula D
Acquire
Combat Commander: Europe
Tide of Iron
Rules | Subscriptions | Bookmarks | Search | Account | Moderators
Space Ghost
flag
Avatar
0708
This post is to provide an updated list to my previous rankings based on the pairwise comparison Bradly-Terry-Luce model (see http://www.boardgamegeek.com/thread/221606).

Technique Summary
The BTL model is used for paired comparison data. This is ideal for the rating data most people provide because, regardless of the rating guidelines, individuals tend to use different underlying scales. Furthermore, the only viable assumption to make is that the rating data for each individual is ordinal (see here if you need a refresher on data types: http://faculty.chass.ncsu.edu/garson/PA765/datalevl.htm).

The model itself is based on fitting a logit model for paired evaluations, providing a parameter estimate for each game. At its most basic level, the only thing that matters when comparing two games is the proportion of times one game is preferred over another. The current framework introduces an additional parameter, P, that provides a weight for the number of the number of times game A was rated when game B was not -- this value can range anywhere from 0 to 1 (e.g., when P = .1, that means that for every 10 times game A was rated and game B was not, game A gets a "preference point". Similarly, if P = 1, then game A would get a point every time it is rated when game B is not).


Advantages of BTL as compared to Bayesian Averaging

Problem:The current Bayesian averaging system is inappropriate for the type of data generated for individual users. The Bayesian system treats the data as interval level data, when actually it is ordinal level data.
BTL Solution:The BTL model actually matches the measurement type of the data.

Problem:The Bayesian averaging system uses an arbitrary number of "dummy votes" of 5.5 to correct the possibility of over weighting games with high ratings but relatively few number of votes. The problem with this approach is that the number of dummy votes is based on an arbitrary heuristic, regardless of how well it is thought out.
BTL Solution: Clearly, the dummy votes are serving a similar role as P in the current modeling framework. It is of note that the nummber of dummy votes is unbounded -- it could be set to infinity and all average ratings would converge to 5.5; on the other hand, in the BTL framework, P is bounded below by zero and above by one. Since P is on a continuous interval, we can integrate it out and compute the expected value of the ranking of each game across the entire interval. In this situation, there is no researcher decision that biases the ranking in any direction -- just the required calculus to integrate the likelihood function.

Problem: The current system convolutes the idea of ranking and rating. Specifically, the average ratings are computed and then the rankings are derived from ordering the rankings.
BTL Solution: The model does not provide average ratings. In fact, it provides a full ranking of all games analyzed. Additionally, for any two games, one can compute the probability that one game is preferred over the other.

Data Stuff
The data were downloaded in early October and consist of all games that had more than 100 individual ratings at the time, resulting in 2,264 games being included in the analysis. The ranking is based on fitting the BTL model via maximum likelihood esimation (see http://en.wikipedia.org/wiki/Maximum_likelihood).

RESULTS TOP 100
1. (Puerto Rico)
2. (Power Grid)
3. (Tigris & Euphrates)
4. (El Grande)
5. (Caylus)
6. (Settlers of Catan, The)
7. (Ra)
8. (Princes of Florence, The)
9. (Ticket to Ride)
10. (Carcassonne)
11. (Agricola)
12. (San Juan)
13. (Lost Cities)
14. (Memoir '44)
15. (Ticket to Ride: Europe)
16. (Citadels)
17. (Samurai)
18. (Through the Desert)
19. (Acquire)
20. (Bohnanza)
21. (Race for the Galaxy)
22. (Goa)
23. (Modern Art)
24. (RoboRally)
25. (Tikal)
26. (Ingenious)
27. (Thurn and Taxis)
28. (Alhambra)
29. (Arkham Horror)
30. (BattleLore)
31. (Shadows over Camelot)
32. (War of the Ring)
33. (Twilight Struggle)
34. (Blokus)
35. (Amun-Re)
36. (Railroad Tycoon)
37. (Saint Petersburg)
38. (Magic: The Gathering CCG)
39. (Taj Mahal)
40. (Carcassonne: Hunters and Gatherers)
41. (Lord of the Rings)
42. (Pillars of the Earth, The)
43. (For Sale)
44. (Age of Steam)
45. (Game of Thrones, A)
46. (Twilight Imperium 3rd Edition)
47. (Pandemic)
48. (Coloretto)
49. (Notre Dame)
50. (Attika)
51. (Torres)
52. (Chess)
53. (Bang!)
54. (Settlers of Catan Card Game, The)
55. (Go)
56. (Shogun)
57. (Lord of the Rings - The Confrontation)
58. (Hive)
59. (Battle Line)
60. (Formula De)
61. (Descent: Journeys in the Dark)
62. (Diplomacy)
63. (Hey! That's My Fish!)
64. (Carcassonne: The Castle)
65. (HeroScape Master Set: Rise of the Valkyrie)
66. (Blue Moon City)
67. (Category 5)
68. (Ticket to Ride: Marklin Edition)
69. (Yspahan)
70. (Civilization)
71. (Scrabble)
72. (Louis XIV)
73. (Traders of Genoa, The)
74. (Imperial)
75. (Mr. Jack)
76. (Age of Empires III: The Age of Discovery)
77. (Colossal Arena)
78. (Liar's Dice)
79. (Apples to Apples)
80. (Die Macher)
81. (Hollywood Blockbuster)
82. (Medici)
83. (La Citta )
84. (Zooloretto)
85. (Tichu)
86. (Nexus Ops)
87. (Wallenstein)
88. (Fury of Dracula)
89. (No Thanks!)
90. (Reef Encounter)
91. (Cosmic Encounter)
92. (Commands & Colors: Ancients)
93. (Elfenland)
94. (1960: The Making of the President)
95. (Jambo)
96. (Vinci)
97. (Maharaja: Palace Building in India)
98. (Union Pacific)
99. (Vegas Showdown)
100. (Can't Stop)


Here we see that Agricola fell from the BGG ranking of 1 to a ranking of 11 by BTL. The following graph helps explain this



Agricola is ranked 2 when P = 0, then moves to 3 at P = .05 and continues to descend afterwards. The truth is that the current BGG ranking uses a narrow range of dummy votes that results in Agricola being one; however, it is completely arbitrary. In the current analysis, I use the expected value because it:

a. Accounts for an average across all feasible constructions of the popularity and availability of games that would impact people's preferences.

b. It is not arbitary, it is very well-defined and mathematically defensible.

c. If one really wanted to define their rankings at a particular popularity level, that is completely possible as well. For instance, while Agricola is ranked 11th on average, it is only ranked 11th or better in the analysis for about P < .15 -- thus, it is possible for a small portion of the range (in this case 15%) to outweigh the remaining portion of the range.

Here are the Top 20 based on P = 0

1. (Conflict of Heroes: Awakening the Bear! - Russia 1941-1942)
2. (Agricola)
3. (Napoleon's Triumph)
4. (Twilight Struggle)
5. (EastFront II)
6. (Power Grid)
7. (Grant Takes Command)
8. (Through the Ages: A Story of Civilization)
9. (Warriors of God)
10. (Brass)
11. (Puerto Rico)
12. (Dominion)
13. (Paths of Glory)
14. (Asia Engulfed)
15. (The Devil's Cauldron: The Battles for Arnhem and Nijmegen)
16. (Advanced Squad Leader (ASL) Starter Kit #3)
17. (DAK2)
18. (Die Macher)
19. (El Grande)
20. (Tigris & Euphrates)

As expected, this list contains games with fewer (but higher rankings) and many more wargames. This type of ranking could be done with any value of P; however, the expected value seems to be the most "fair" for a general ranking of the games.

EDIT: More rankings (200 and on) and more graphs can be included as requested. Or more lists at different levels of popularity -- all in all, it is a very flexible modeling system.

EDIT: Added List for 0 < P < .5
This edit was put in to illustrate some of the concerns about using the full range of P. This uses half of the range, here Agricola only falls to 7, instead of 11. In short, it shows that the top half of the range of P doesn't have too dramatic of an effect on the overall outcome.

1. (Puerto Rico)
2. (Power Grid)
3. (Tigris & Euphrates)
4. (El Grande)
5. (Caylus)
6. (Princes of Florence, The)
7. (Agricola)
8. (Ra)
9. (Settlers of Catan, The)
10. (Ticket to Ride)
11. (Race for the Galaxy)
12. (San Juan)
13. (Goa)
14. (Memoir '44)
15. (Carcassonne)
16. (Ticket to Ride: Europe)
17. (Samurai)
18. (Twilight Struggle)
19. (Acquire)
20. (Through the Desert)
21. (Modern Art)
22. (Lost Cities)
23. (War of the Ring)
24. (Tikal)
25. (BattleLore)
26. (Ingenious)
27. (Citadels)
28. (Railroad Tycoon)
29. (Amun-Re)
30. (Age of Steam)
31. (Bohnanza)
32. (Thurn and Taxis)
33. (Taj Mahal)
34. (RoboRally)
35. (Arkham Horror)
36. (Pandemic)
37. (Blokus)
38. (Pillars of the Earth, The)
39. (Shadows over Camelot)
40. (Shogun)
41. (Saint Petersburg)
42. (Twilight Imperium 3rd Edition)
43. (Notre Dame)
44. (Alhambra)
45. (Game of Thrones, A)
46. (For Sale)
47. (Magic: The Gathering CCG)
48. (Carcassonne: Hunters and Gatherers)
49. (Battle Line)
50. (Torres)
51. (Go)
52. (Lord of the Rings - The Confrontation)
53. (Die Macher)
54. (Lord of the Rings)
55. (Ticket to Ride: Marklin Edition)
56. (Attika)
57. (Imperial)
58. (Descent: Journeys in the Dark)
59. (Hive)
60. (Age of Empires III: The Age of Discovery)
61. (Coloretto)
62. (Yspahan)
63. (Blue Moon City)
64. (Traders of Genoa, The)
65. (Wallenstein)
66. (Mr. Jack)
67. (Civilization)
68. (Carcassonne: The Castle)
69. (Louis XIV)
70. (Tichu)
71. (Hollywood Blockbuster)
72. (Commands & Colors: Ancients)
73. (1960: The Making of the President)
74. (La Citta )
75. (Medici)
76. (Settlers of Catan Card Game, The)
77. (Reef Encounter)
78. (Hey! That's My Fish!)
79. (Formula De)
80. (HeroScape Master Set: Rise of the Valkyrie)
81. (Diplomacy)
82. (Fury of Dracula)
83. (Chess)
84. (Union Pacific)
85. (Maharaja: Palace Building in India)
86. (Liar's Dice)
87. (Colossal Arena)
88. (Vinci)
89. (Nexus Ops)
90. (Bang!)
91. (Category 5)
92. (Thebes)
93. (Vegas Showdown)
94. (In the Year of the Dragon)
95. (Zooloretto)
96. (Stone Age)
97. (YINSH)
98. (Jambo)
99. (Santiago)
100. (PitchCar)


Last edited on 2008-11-19 11:52:26 CST (Total Number of Edits: 4)
Steven Duff
flag
Avatar
0708
That first list looks way better than the official one.
T. Nomad
flag
Avatar
070809
I don't understand much of it, but I suspect you meant to say BTL model.
Space Ghost
flag
Avatar
0708
tommynomad wrote:
I don't understand much of it, but I suspect you meant to say BTL model.


Thanks for catching that -- have a Geek Nickel. It is a wonder I have had anything published :(
Hunga Dunga
flag
Avatar
070809
I think wargames are awesome. Don't you?
Chris Ferejohn
flag
Avatar
So if I can attempt to explain what I *think* you are saying (and please do correct me if I am wrong), this model corrects for the fact that what is an "8" to one person doesn't necessarily mean the same thing as an "8" to someone else. So if someone ranks a lot of games quite low, then their high rankings carry more weight.

On the other hand, if someone is a giant game whore, like oh, say, me (average rating of just under 7), my high rankings won't mean as much while my low rankings will be relatively damning (take that On the Underground!).

Is that roughly what this is trying to accomplish translated into "See Dick. See Dick do statistical analysis. Analyze Dick, analyze!"?
Space Ghost
flag
Avatar
0708
cferejohn wrote:
So if I can attempt to explain what I *think* you are saying (and please do correct me if I am wrong), this model corrects for the fact that what is an "8" to one person doesn't necessarily mean the same thing as an "8" to someone else. So if someone ranks a lot of games quite low, then their high rankings carry more weight.

On the other hand, if someone is a giant game whore, like oh, say, me (average rating of just under 7), my high rankings won't mean as much while my low rankings will be relatively damning (take that On the Underground!).

Is that roughly what this is trying to accomplish translated into "See Dick. See Dick do statistical analysis. Analyze Dick, analyze!"?


Not quite. The only thing that matters is a persons order of games. So, if I rate Descent 8.5 and Monopoly 6, then it is clear that I prefer Descent > Monopoly. If you rate Descent 10 and Monopoly 1, then you too prefer Descent > Monopoly. Those two preferences are given equal weight because my 8.5 might be the same as your 10 and your 1 might be the same as my 6.

So all that matters is the order of preferences -- like:

User #1: Descent > Agricola > Monopoly > Dominion
User #2: Agricola > Dominion > Descent > Monopoly
User #3: Dominion > Descent > Monopoly > Agricola

This is really the only consistent information we can extract from the ratings data. These preference orders are then amalgamated through the modeling process.

Roger Leroux
flag
Avatar
04060708
Hungadunga wrote:
I think wargames are awesome. Don't you?


Yeah baby.
Roger Leroux
flag
Avatar
04060708
steinley wrote:
Not quite. The only thing that matters is a persons order of games. So, if I rate Descent 8.5 and Monopoly 6, then it is clear that I prefer Descent > Monopoly. If you rate Descent 10 and Monopoly 1, then you too prefer Descent > Monopoly. Those two preferences are given equal weight because my 8.5 might be the same as your 10 and your 1 might be the same as my 6.

So all that matters is the order of preferences -- like:

User #1: Descent > Agricola > Monopoly > Dominion
User #2: Agricola > Dominion > Descent > Monopoly
User #3: Dominion > Descent > Monopoly > Agricola

This is really the only consistent information we can extract from the ratings data. These preference orders are then amalgamated through the modeling process.


Nifty!

I have questions... (always with the questions)

How does the method account for equivalent preferences within a user.

For instance, let's say this example:

User #1: Descent > Agricola = Monopoly > Dominion
User #2: Agricola = Dominion > Descent > Monopoly
User #3: Dominion > Descent > Monopoly = Agricola

I'm just curious because looking at my own collection, I have about 200 items and they're rated more or less according to a standard distribution. I have a lot of games rated a 7 for instance, so how would those preferences wash out since without asking me which of my 7's I like best I don't see how an algorithm would know that I preferred GameA to GameB.

Would the sheer number of equivalences smooth that out, or would secondary preferences be considered?

Finally, with this kind of methodology how likely is it to have a game to have a higher ranking than a game with a higher rating?

Last edited on 2008-11-19 01:40:45 CST (Total Number of Edits: 1)
Tom Chappelea
flag
Avatar
What's striking is how the older, more "classic" Euros have risen to the top. It looks like BGG when I first signed up.

Any guesses as to why this might be?
(The Artist formerly known as) Arnest R
flag