Recommend
137 
 Thumb up
 Hide
47 Posts
1 , 2  Next »   | 

BoardGameGeek» Forums » Gaming Related » General Gaming

Subject: On Ratings rss

Your Tags: Add tags
Popular Tags: ratings [+] rating [+] [View All]
Matthew Gray
United States
Reading
Massachusetts
flag msg tools
admin
Avatar
mbmbmbmbmb
Since I do a lot of statistical analysis on the geek, one category of
questions I get a lot are those about the "validity" of BGG Ratings.
I finally got around to writing up a bunch of the notes on this. Enjoy.

How much a difference in rating/ranking is significant?

Well, depends what you mean, but I can answer the question I think
most people are asking better than this one. Tests for statistical
"significance" are common, but most are based on assumptions that are
simply not valid for BGG ratings. This isn't to say such measures are
completely useless, but they shouldn't be treated as the final word.
The "ratings error" calculated in this manner is somewhere in the
ballpark of 0.2 points. For games with thousands and thousands of
ratings, it is much lower, below 0.1. For games with fewer ratings,
it's more like 0.5 or more. But, the assumptions that go into these
calculations don't hold for BGG, so the numbers are even more
approximate.

Easier to evaluate is what is the chance you (a random BGG user) will
like a particular game better than another game, given their relative
ranks. For this, we don't need to make as many assumptions, as we can
look at the raw ratings distributions for those games. This still has
some issues with sample bias, but it's better. The answer is, knowing
nothing else, if the games are 50 ranks apart, there's a 60% chance
you like the higher ranked one better. If the games are 250 ranks
apart, 70%. 700 ranks, 80%. 2000 ranks, 90%. Now, games at the very
top of the chart (roughly top hundred) actually give higher
confidence. If the games being compared are near the top of the
chart, multiply the difference by a factor of about 2 to 5. So,
roughly speaking, you're 70% more likely to like a game that's 50 to
125 ranks higher, if it's near the top.

In other words, rankings/ratings are a rough estimate. They're
far from meaningless. Between two games, with no other information,
you're more likely to like the one with a better rank. But, if one is
in a genre you like better, by a designer you like better, from a
publisher you like better or uses mechanics you like better, you'll
probably like it more, unless the other game outranks it by a few
hundred ranks.

Personally, I tend to look at game ranks in roughly 5 "star"
categories: 1-100 (5 stars), 101-500 (4 stars), 501-1000 (3 stars),
1001-2000 (2 stars) and 2001+ (1 star). If a game has a feature
(designer, publisher, mechanic, theme, etc.) I'm especially fond of, I
give it another star or two. Games with features I tend to dislike,
get docked a star or two. Ratings/reviews from trusted users might
bump it up or down one star, but for me, I don't find many
reviewers/raters who I can consistently trust. Then, if a game has 6
or more stars, I probably buy it before playing. 5 stars, I actively
seek it out to try it. 4 stars, I'm happy to give it a try. 3 stars,
I'm willing to give it a try. 2 stars, I have to be convinced. 1
star, I avoid it. For me, it works.

Wouldn't the ratings be better/more accurate if we ignored ratings from inactive users?

They wouldn't be much different. In fact, they'd be only about as
different as you'd expect from any arbitrary reduction of sample size.
I have not yet identified anything to suggest that older/inactive
users ratings are in any substantial characteristic different from
those of active/recent users.

What if we got rid of ratings that haven't been updated in a certain period?

No substantial change, until you make it a really recent cutoff, at
which point the "top" lists are all exclusively new games.

What if we just use the plain average instead of the Bayesian average, with a cutoff for minimum number of ratings?

No matter what value of cutoff you use, it introduces a large bias
toward games that have just barely enough to make the cutoff. In
fact, for any particular value of the cutoff, roughly 20% of the top
games (whether top 10, top 100, whatever) are very close to the
cutoff. What this means is if you were to lower the cutoff a little,
you add in a bunch of games that were arbitrarily removed by having
the cutoff higher. If you raise the cutoff a little, you cut out a
bunch of games, equally arbitrarily. The Bayesian average provides a
"soft" cutoff.

Actually, if you're willing to raise the cutoff up to about 500
ratings, minimum, the effect goes away. That would leave only 422
games ranked on the geek.

What if we restricted it to people who have played at least 3 times?

Well, the average rating of games would go up a ton because people
don't tend to play bad games that many times. Specifically the
average rating would go up by nearly a point.

It would also introduce a big bias against longer games, introduce a
bias toward 2-player games and reduce the sample size dramatically, as
many fewer people log plays than submit ratings. Other than those
shifts, many other results would remain very similar.

How about a "waiting period" before a game is rated/ranked?

Well, the Bayesian Average already has some of this effect. That
said, there is a distinct, early ratings bump many games get. That
is, when a game only has a few hundred ratings, it is often rated much
more highly then when it has many hundreds or over 1000 ratings. In
particular, it seems the average dropoff is about 0.3 points from 350
ratings to "steady state", which sometimes takes till 1000 ratings or
more. Before 350 ratings, there's a lot of variability in the
average.

What if we only count ratings from people who have rated, say, 300 games?

The top 11 games remain exactly the same, in slightly different order,
despite what would amount to a sample destroying reduction in number
of raters. Neat.

Wouldn't clusters somehow make this all so much better?

Oooh, probably.
47 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Kevin H
Canada
Windsor
Ontario
flag msg tools
badge
Avatar
mbmbmb
mkgray wrote:
Personally, I tend to look at game ranks in roughly 5 "star"
categories: 1-100 (5 stars), 101-500 (4 stars), 501-1000 (3 stars),
1001-2000 (2 stars) and 2001+ (1 star). If a game has a feature
(designer, publisher, mechanic, theme, etc.) I'm especially fond of, I
give it another star or two. Games with features I tend to dislike,
get docked a star or two. Ratings/reviews from trusted users might
bump it up or down one star, but for me, I don't find many
reviewers/raters who I can consistently trust. Then, if a game has 6
or more stars, I probably buy it before playing. 5 stars, I actively
seek it out to try it. 4 stars, I'm happy to give it a try. 3 stars,
I'm willing to give it a try. 2 stars, I have to be convinced. 1
star, I avoid it. For me, it works.


I try to limit myself to the top 700. I find that that is more or less around where games tend to get more 7's than 6's...which I think is important because at least to me, 7's mean a game is pretty good and 6 is more ho-hum. Also, I've tried so few of the top games that it wouldn't make any sense for me to try anything lower.

Quote:
What if we just use the plain average instead of the Bayesian average, with a cutoff for minimum number of ratings?

I like Bayesian averages over straight ratings. But I wonder if something can be done to help games compete with those at the top with 3000, 5000, 8000 ratings.

I wonder if, say...only the top 1,000 ratings for a game were thrown into a Bayesian average...would that help games with less ratings compete? Or maybe do the opposite?

Wouldn't clusters somehow make this all so much better?

Oooh, probably.


I think people would just argue over what should be in what cluster...giving them something else to bitch about? Are you saying that there would be 5 different sets of ratings?

The only major "cluster" problem is really the wargames. Would would the top say 25 of that cluster look like and how would that work?
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Eric
Canada
Gatineau
Quebec
flag msg tools
badge
Avatar
mbmbmbmbmb
I think that the best way to look at it, is with the correllation, sure, the rating gives you an idea, but if I like silly kids game, ratings here means nothing for me, unless I'm really into Eurogames and like the heavier games. So with the addition of the correlation in each game stats, it can help. I wish we could use your correllation for each users to find users that have similar tastes and it would give us another idea of what games we may like exists.
2 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Daniel Karp
United States
Rockville
Maryland
flag msg tools
admin
Developin' Developin" Developin!!
badge
100 geekgold for OverText, and all I got was this stupid sentence.
Avatar
mbmbmb
Very interesting. For some numbers to go with those not-so-applicable statistical measures of error, check out this woefully outdated analysis I did some time ago:

http://www.boardgamegeek.com/article/426174#426174
2 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Joe Grundy
Australia
Sydney
NSW
flag msg tools
badge
Avatar
mbmbmbmbmb
A most excellent summary, Matthew. Two thumbs up! (Hey, I can give two thumbs up!)

MontyCircus wrote:
I think people would just argue over what should be in what cluster...giving them something else to bitch about? Are you saying that there would be 5 different sets of ratings?
Of course you can already use the existing ratings within any filter set and see either Bayesian softened or raw average ratings only for games that match a set of your own preferences.

And sometimes people forget that only about one in ten games has a ranking at all (ie 30+ ratings). While there are quality obscure titles, and there are games we love to publish how much we don't like them, for the most part if it got a ranking at all it's likely to be ok.
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Jorge Arroyo
Spain
flag msg tools
designer
badge
Avatar
mbmbmbmbmb
Great article

The thing is, when I look at my top games, almost none ranks higher than 7 (most hover around 6.5) on the geek. So I cannot guide my purchases only with the BGG rank. I tend to look at reviews, comments and sessions to make my decisions.

3 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
(The Artist formerly known as) Arnest R
Germany
Munich
Bavaria
flag msg tools
badge
Keep calm and carry on...
Avatar
mbmbmbmbmb
MontyCircus wrote:

I think people would just argue over what should be in what cluster...giving them something else to bitch about? Are you saying that there would be 5 different sets of ratings?


No, that's the beauty of clusters, once you have decided how many of them you want (five seems to be a reasonnable number, with the added benefit that they have "representative" games that many are familiar with), they sort themselves out, i.e. they're defined by correlations only. And a game is not "in a cluster", it is "liked by x% in a cluster"...
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Jim Cote
United States
Maine
flag msg tools
badge
Avatar
mbmbmbmbmb
Nice, but word wrap is thumbsdown
3 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
jgrundy wrote:
And sometimes people forget that only about one in ten games has a ranking at all (ie 30+ ratings). While there are quality obscure titles, and there are games we love to publish how much we don't like them, for the most part if it got a ranking at all it's likely to be ok.


See http://www.boardgamegeek.com/geeklist/19030
which I think makes for interesting reading
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Clark Rodeffer
United States
Ann Arbor
Michigan
flag msg tools
designer
badge
Avatar
mbmbmbmbmb
mkgray wrote:
What if we restricted it to people who have played at least 3 times?
Well, the average rating of games would go up a ton because people don't tend to play bad games that many times. Specifically the average rating would go up by nearly a point.

It would also introduce a big bias against longer games, introduce a bias toward 2-player games and reduce the sample size dramatically, as many fewer people log plays than submit ratings. Other than those shifts, many other results would remain very similar.

This has almost certainly been mentioned before. But what would happen if player ratings were weighted according to the number of plays they've logged? For example, say PlayerX has logged N plays for GameY. Instead of just adding in the rating once, add in N ratings. Of course the denominator for your average would have to be based on number of logged plays rather than number of players who rated the game. If this is too big an effect, how about a weighting scheme of 1+log(N) instead of just N?

For those who think older ratings should count less, how about letting ratings decay over time by applying a weighting based upon the amount of time that has passed since the last logged play? For example, if D is the number of days ago that PlayerX last logged a play for GameY, weight PlayerX's rating by multiplying it by 1/e^(D/90), remembering to use 1/e^(D/90) for that player in the denominator sum as well. Choose whatever number of days you want for the denominator, but 90 is about three months, and my gut says it seems like a good place to start.

Clark
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Daniel Karp
United States
Rockville
Maryland
flag msg tools
admin
Developin' Developin" Developin!!
badge
100 geekgold for OverText, and all I got was this stupid sentence.
Avatar
mbmbmb
If you ratings were weighted by number of plays, all game ratings would shoot way up. People who play a game a lot presumably like it a lot, so you'd be more heavily weighting the high ratings.

Just because a game was rated long ago doesn't mean it wasn't validated recently. Many people regularly review their ratings. Should my rating be worth less because I decide that the game is still an 8?

If you want to count older ratings less, it would seem to make more sense just not to include ratings from people who having logged in for some long period of time.
3 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
David Molnar
United States
Ridgewood
New Jersey
flag msg tools
Avatar
mbmbmbmbmb
CDRodeffer wrote:

This has almost certainly been mentioned before. But what would happen if player ratings were weighted according to the number of plays they've logged? For example, say PlayerX has logged N plays for GameY. Instead of just adding in the rating once, add in N ratings. Of course the denominator for your average would have to be based on number of logged plays rather than number of players who rated the game. If this is too big an effect, how about a weighting scheme of 1+log(N) instead of just N?

For those who think older ratings should count less, how about letting ratings decay over time by applying a weighting based upon the amount of time that has passed since the last logged play? For example, if D is the number of days ago that PlayerX last logged a play for GameY, weight PlayerX's rating by multiplying it by 1/e^(D/90), remembering to use 1/e^(D/90) for that player in the denominator sum as well. Choose whatever number of days you want for the denominator, but 90 is about three months, and my gut says it seems like a good place to start.

Clark


As long as it has e in it, I'd be happy.

Well, here was my idea, and apologies if I'm getting too far astray from Matthew's original thread. Let the individual user decide the weight that their rating should carry. That is, first time I played Samurai, I knew I loved it (it helped that I won), so I rated it a 9, but I realized that this rating was more subject to change than my rating for Through the Desert. So I might have given my Samurai rating a weight of .5 (on a scale from 0 to 1) and my 9 for Through the Desert a weight of .9. If I played a game once that was further away from what I usually play and enjoy, like something that involved running around the table, I might only give a weight of .2 to my initial rating. Whereas if I played Pass the Pigs Deluxe Edition and hated it so much that I would refuse to play that version again, I would give my rating a weight of 1. Then when I revisit my ratings later on, I might still think a particular game is an 8, but be more sure of that 8 and increase the weighting. If you look at Fawkes' detailed comments for games you can imagine how he might have changed the weights on his ratings after repeated plays.

David
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Jorge Arroyo
Spain
flag msg tools
designer
badge
Avatar
mbmbmbmbmb
I think letting the users decide the weight has a lot of potential for abuse...

As for ratings decaying over time, I wouldn't use the play logs. Not everybody logs their plays, and their ratings shouldn't count less... Maybe just have it use the days passed since the person last visited the site...

-Jorge
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Matthew Gray
United States
Reading
Massachusetts
flag msg tools
admin
Avatar
mbmbmbmbmb
maka wrote:
I think letting the users decide the weight has a lot of potential for abuse...


I agree.

Quote:
As for ratings decaying over time, I wouldn't use the play logs. Not everybody logs their plays, and their ratings shouldn't count less...


I agree.

Quote:
Maybe just have it use the days passed since the person last visited the site...


Which has no perceivable effect on ratings or rankings other than reducing the number of ratings.
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Matthew Gray
United States
Reading
Massachusetts
flag msg tools
admin
Avatar
mbmbmbmbmb
Another ratings experiment I tried a long time ago (a couple years at least) that many might find surprising.

I took a user and randomly split their ratings into two groups. I temporarily "removed" one of those groups from the calculations. Using their remaining ratings, I found the other users most directly correlated with them (I think I picked the 25 most, but it might have been anything from 10 to 50). Call these people the "trusted raters". Then, I calculated what the average ratings for various games would be, according to the trusted raters. Then, I put the ratings from earlier that I removed back and compared those values to those from the trusted raters. Call the average error "T".

Then, I did the same procedure, but instead of picking a set of trusted raters, I just took everyone. All the ratings, indiscriminately. Call that calculated error "A".

You might (I certainly did) expect that T would be much lower than A. After all, those ratings were calculated using the ratings of people who we've established are highly correlated with the user's ratings. It turns out not to be the case at all. Depending on which user I looked at, T was often worse than A. On average, T was better than A but only very very slightly.

The lessons I took from this, and that has bean heavily validated by additional analysis since then:

- Ratings from people "with similar tastes" are better than random people, but not as much better than you think, and...
- Many more ratings are better, even if you have to take some from people who don't have quite as similar tastes.

This is one of the reasons I find the clustering analysis so promising. It yields very large numbers of ratings, but still restricted to people "with similar tastes" to some degree.
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Ylaine Gerardin
United States
Boston
Massachusetts
flag msg tools
Avatar
mb
You ever go back and track which users have T>A versus A>T? Perhaps a certain cluster of users has one or the other?

Interesting to know what makes you a person for whom the mass of predictions is more accurate than the predictions of the people who are most similar to a subset of your preferences...
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Matthew Gray
United States
Reading
Massachusetts
flag msg tools
admin
Avatar
mbmbmbmbmb
Snapper wrote:
You ever go back and track which users have T>A versus A>T? Perhaps a certain cluster of users has one or the other?

Interesting to know what makes you a person for whom the mass of predictions is more accurate than the predictions of the people who are most similar to a subset of your preferences...


The differences between the error measures were always so small that it's improbable any kind of class separation like that would be possible.

What it really says is that for the most part the difference between a "good game" and a "bad game" is far greater than the difference between "the games I like" and "the games you like".

I am a Eurogamer in preference in general. That mostly means that among games of roughly "equal quality", I will probably prefer the Eurogame but I'm not atypical in finding that a good wargame (say, "We The People") or amerigame (say, "Memoir '44") is better than a bad or mediocre Eurogame.

My recent clustering analysis shows there are some games where the clusters strongly disagree as to whether the game is any good, but those are the exception, not the rule. No cluster thinks Puerto Rico, E&T, Carcassonne, War of the Ring or ASL is bad, it's just a matter of "good" or "great".
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
David Molnar
United States
Ridgewood
New Jersey
flag msg tools
Avatar
mbmbmbmbmb
mkgray wrote:
maka wrote:
I think letting the users decide the weight has a lot of potential for abuse...


I agree.


I would argue that this potential would only be more overt, not necessarily greater in degree, than what we have now, where a user who gives ratings with a higher standard deviation effectively has a greater impact on the overall ratings than someone whose ratings are tightly clumped. Also, the default would have to be a weight of 1 (again, I'm talking about a 0 to 1 scale) for all the gazillions of ratings in the db, so by giving one of your own ratings a lower weight, the only one you'd be abusing would be yourself...

I think what I'm suggesting would have greater impact on something like a carefully-chosen geekbuddy list than the overall ratings.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
United States
kaysville
Utah
flag msg tools
badge
Avatar
mbmbmbmbmb
mkgray wrote:

Personally, I tend to look at game ranks in roughly 5 "star"
categories: 1-100 (5 stars), 101-500 (4 stars), 501-1000 (3 stars),
1001-2000 (2 stars) and 2001+ (1 star). If a game has a feature
(designer, publisher, mechanic, theme, etc.) I'm especially fond of, I
give it another star or two. Games with features I tend to dislike,
get docked a star or two. Ratings/reviews from trusted users might
bump it up or down one star, but for me, I don't find many
reviewers/raters who I can consistently trust. Then, if a game has 6
or more stars, I probably buy it before playing. 5 stars, I actively
seek it out to try it. 4 stars, I'm happy to give it a try. 3 stars,
I'm willing to give it a try. 2 stars, I have to be convinced. 1
star, I avoid it. For me, it works.


Thank you for posting your insight into both ratings and BGG rankings. The only part I take issue with is the discussion about rankings. Specifically, I am sure there are many of the more obscure games which would climb much higher in the rankings of BGG but for BGG's practice of throwing thirty or so lower ratings at the game before establishing the ranking. This could cause games to drop a few stars (which they likely deserve to have) in your system. The bias in the BGG system is in favor of the better-distributed and better-known games to be ranked higher, not necessarily the better games. I understand that in the long run the rankings usually work out, but in the short run I think the system of throwing bogus ratings at games before the BGG ranking is determined makes it more difficult for users to find this type of game and perhaps slows the process of innovative new games finding wider publication (which, curiously, slows the process of a new game garnering enough ratings to overcome the institutional bias against small print run games).
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Douglas Buel
United States
Hollywood
Florida
flag msg tools
badge
Avatar
mbmbmbmbmb
One system that they use on some Web sites where items are rated is starting each item with 1000 average ratings from nonexistent users.

For example, suppose a Web site rated videos on a scale of 0 to 10. Under this system, new videos are automatically given 1000 ratings of "5."

Ratings by actual users are then added in. Continuing the example, suppose five actual users now rate an item a "10." That item would now have a rating of 5.02.



Under such a system, it's not necessary to have a "cutoff" where items that haven't received ratings don't show up. A small number of users can't manipulate the rating of an item in a ridiculous way.
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Matthew Gray
United States
Reading
Massachusetts
flag msg tools
admin
Avatar
mbmbmbmbmb
dbuel wrote:
One system that they use on some Web sites where items are rated is starting each item with 1000 average ratings from nonexistent users.

...

Under such a system, it's not necessary to have a "cutoff" where items that haven't received ratings don't show up. A small number of users can't manipulate the rating of an item in a ridiculous way.


This is exactly what's in place on BGG. Except, instead of 1000, it's percentage based so it goes up as BGG grows. The number is a bit over 100 currently.

Scott has further decided to have a (very low) cutoff of 30 ratings to prevent having 20,000 games in the middle of the pack.

(edit: added clarification)
2 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Douglas Buel
United States
Hollywood
Florida
flag msg tools
badge
Avatar
mbmbmbmbmb
mkgray wrote:
This is exactly what's in place on BGG. Except, instead of 1000, it's percentage based so it goes up as BGG grows. The number is a bit over 100 currently.


Oh, I see. I guess I misunderstood how the current system works.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Eric Sanders
United States
Greendale
Wisconsin
flag msg tools
mbmbmbmbmb
mkgray wrote:
What if we just use the plain average instead of the Bayesian average, with a cutoff for minimum number of ratings?

Actually, if you're willing to raise the cutoff up to about 500
ratings, minimum, the effect goes away. That would leave only 422
games ranked on the geek.

What if we only count ratings from people who have rated, say, 300 games?

The top 11 games remain exactly the same, in slightly different order,
despite what would amount to a sample destroying reduction in number
of raters. Neat.

I'd LOVE to see the sorting for these 442 - and the re-rankings of the top 11, if you have that info floating around...
2 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Kevin H
Canada
Windsor
Ontario
flag msg tools
badge
Avatar
mbmbmb
maka wrote:
Great article

The thing is, when I look at my top games, almost none ranks higher than 7 (most hover around 6.5) on the geek. So I cannot guide my purchases only with the BGG rank. I tend to look at reviews, comments and sessions to make my decisions.


You rated 28 games an 8, 9 or 10 (half the games you've rated).
5 are not ranked.

Of the remaining 23:
87% are in the top 1000
70% are in the top 500

I don't think the games you like are as lowly rated as you think
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
United States
Texas
flag msg tools
badge
"that's a smith and wesson, and you've had your six"
Avatar
mbmbmbmbmb
e_sandrs wrote:
mkgray wrote:
What if we just use the plain average instead of the Bayesian average, with a cutoff for minimum number of ratings?

Actually, if you're willing to raise the cutoff up to about 500
ratings, minimum, the effect goes away. That would leave only 422
games ranked on the geek.

What if we only count ratings from people who have rated, say, 300 games?

The top 11 games remain exactly the same, in slightly different order,
despite what would amount to a sample destroying reduction in number
1of raters. Neat.

I'd LOVE to see the sorting for these 442 - and the re-rankings of the 4 top 11, if you have that info floating around...


From this run

Original top 11

1 Puerto Rico
2 Power Grid
3 Twilight Struggle
4 Tigris & Euphrates
5 Caylus
6 El Grande
7 Princes of Florence, The
8 Shogun 7.89
9 Age of Steam
10 BattleLore
11 Die Macher

With the Raw average and at least 500 ratings:

1 Agricola(13)
2 Puerto Rico(1)
3 Twilight Struggle(3)
4 1960: The Making of the President(17)
5 Power Grid(2)
6 Tigris & Euphrates(4)
7 Hannibal: Rome vs. Carthage(16)
8 Paths of Glory(19)
9 Caylus(5)
10 Combat Commander: Europe(27)
11 Shogun(8)
3 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
1 , 2  Next »   | 
Front Page | Welcome | Contact | Privacy Policy | Terms of Service | Advertise | Support BGG | Feeds RSS
Geekdo, BoardGameGeek, the Geekdo logo, and the BoardGameGeek logo are trademarks of BoardGameGeek, LLC.