Recommend
453 
 Thumb up
 Hide
103 Posts
Prev «  1 , 2 , 3 , 4 , 5  Next »   | 

BGG» Forums » Gaming Related » General Gaming

Subject: Rating with confidence - going TOTALLY NUTS wth statistics! rss

Your Tags: Add tags
Popular Tags: stats [+] statistics [+] bgg [+] ranking [+] rating [+] [View All]
Dane Peacock
United States
Stansbury Park
Utah
flag msg tools
badge
That tickles
Avatar
Microbadge: Too Many Bones: Undertow - Stanza fanMicrobadge: Dune fanMicrobadge: Middara fanMicrobadge: SEA EVIL zealotMicrobadge: Cave Evil fan
Great stuff. I don't know what it all means, but theres graphs and everything. You just earned your way into my group of elite statistical geeks.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Darren M
Canada
High Level
AB
flag msg tools
badge
Avatar
mb
Re: Rating with confidence - going TOTALLY NUTS wth statisti
Nice work... and I do mean WORK. I know how much effort it takes to play with all that data until you squeeze an interesting story from it.

I think playing with the stats on BGG is almost as much fun as playing a good board game. We are sick stats puppies.

I've always felt similar to what I think you've shown through your analysis... that in the grand scheme of things... slight skewness and biases don't really cloud the overall results of a ratings system. They (skewed and biased views) are part of our individual personalities and are thus part of any ratings based on our viewpoints as well.

The Bayesian average format on BGG certainly has it's "heart" in the right place... helping to show which games are the most widely accepted to be "the" best for the widest cross section of gamers while penalizing niche games with fewer ratings. Whether it succeeds in the picture it then paints of the "top games" is up to each individual to decide.

You will likely never convince a party game lover that Die Macher or Roads & Boats is better than Time's Up or Apples to Apples because it says so on BGG and vice versa if you told a Paths of Glory or Europe Engulfed veteran that Battlelore is the greatest wargame ever made so they may as well get rid of that complex trash and play the best.. they'd knee you straight in the chits.

Relativity and subjectivity being what they are... every time we slash and hack at the stats we come up with a different view of what we think the overall world of boardgaming should look like but when applied at the personal level where individuals actually decide which games best suit them and their gaming groups... those overall views mean little to nothing unless you are one of the "freaks" that sits right there in the middle of the fence and likes everything exactly as much as the average of over 1 million opinions and viewpoints.

Personally I have always gravitated towards games with 1) a larger than average number of ratings 2) a lower than average standard deviation and 3) a higher than average RAW user rating(ignoring the bayesian average).

The games this approach finds are the games that just seem to work best in a practical sense for the gaming group I play with. They are not necessarily the BEST games but they are generally very good games for US... so that ranking method works for us even though it has a relatively low correlation to the actual BGG rankings.

They seem to be the games that we generally are the most receptive to which is the true test of a ranking system. As others have alluded to.. for a ranking system to be perfect most of the highest games on it should be games that "work" for you and your group.. certainly this isn't the case with BGG... as there are many people (even probably a vast majority of people) who certainly wouldn't enjoy all the games in the Top 20, Top 50 or Top 100 on BGG.

It's the same though with any ranking system... IMDB.com, Gamerankings.com, Consumer Reports or any other ranking or selection scheme... none of them will likely give you a personal view of what's best for you specifically... but rather just a general guide to look through and help with some starting points on what to pick and choose.

I'd say BGG does that well... if you use the rankings as a guide to look at games you may be interested in and search and read... and read some more until something interests you then the ranking system has done it's job well. If people blindly buy according to the ratings... there will be plenty of disappointed gamers... especially those new to the hobby.
3 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Joe Grundy
Australia
Sydney
NSW
flag msg tools
badge
Avatar
Microbadge: Pregnant - baby on boardMicrobadge: It's a Boy!!Microbadge: Pregnant - baby on boardMicrobadge: Microbadge: Offline from The Geek for a while
nexttothemoon wrote:
I'd say BGG does that well... if you use the rankings as a guide to look at games you may be interested in and search and read... and read some more until something interests you then the ranking system has done it's job well. If people blindly buy according to the ratings... there will be plenty of disappointed gamers... especially those new to the hobby.
Absolutely. In this respect, I'm almost inclined to suggest perhaps the rankings should be based on the "total goodwill" figure which seems to marry up total popularity so well.

And then for personalised ratings searches you can use a correlation tool. Like this one:
http://www.boardgamegeek.com/article/1404145#1404145
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Timothy Hunt
United States
St Louis
Missouri
flag msg tools
badge
Avatar
mbmbmbmbmb
Re: Rating with confidence - going TOTALLY NUTS wth statisti
Thanks Joe, very interesting.

I have to admit, that the Condorcet system you mention seems rather similar to Relative Placement that Swing Dance competitions use. http://www.swingdancecouncil.com/library/relativeplacement.h...

THough of course, in that system, all the "raters" rate all the "games". (and does allow for equality!)
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Vernon Harmon
United States
Fairport
New York
flag msg tools
mbmbmbmb
As others have said: Wow, Joe. Nice work! Definitely some useful info in there. I do have to take issue with something though.

jgrundy wrote:
Since each user's quality-of-games actually rated will be different, we can only get quality rankings from averages if most of us rate each game independently... without considering any of our other ratings.
This I agree with. See my response to snicholson's thread.

However, you go on to say....

jgrundy wrote:
If someone rates four games (A,B,C,D) they're making fuzzy individual statements about each game, but very firm comparative statements about whether A is better than C or D is better than A.
This is not true, even if someone is consistent with their ratings and is consistent with the BGG rating guidelines. It goes a bit to the crux of snicholson's dilemma of quality vs. replayability. Beyond that, however, you have made a plea to "rate each game independently" which by extension means it is NOT a comparative statement.

For example, if I am rating a light filler game, I am likely to rate it independently on a different "feel" than I will rate a heavy-weight Euro. Of course, you can make the case that I *am* still making a comparative statement, but it's a qualified statement of comparasion against only a subset of all of the games I rate. However, if, as snicholson is now doing, you rate more based on replayability, the only comparative statement you are making is "I am more likely to play A than B."

At any rate, I believe your assumptions regarding the "Assertions" and "Pairwise" rankings to be flawed. This does not, however, diminish my awe and appreciation for the work you have done, nor does it appreciably diminish your conclusions, IMO. The senstivity to the Bayesian parameters should be a concern, I think, but how individual users choose to make their ratings should be a non-factor, aside from using a consistent rating style/strategy/approach.

Oh, one last thing: why would it be such an awful thing to have "niche" games float higher in the rankings? I would assert that it would actually be a Good Thing to increase visibility of games that others might like but never ever know about. In addition, time would provide a balance: as these "niche" games spent time high on the rankings, more people would be inclined to try them out and subsequently rate them, thereby establishing them "correctly" within the rating hierarchy -- a truly good game will maintain a presence in the rankings, while a poorer game with truly niche appeal will sink off the list. This is no different than having a hot new Euro explode onto the rankings but drop over time, no?
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Michael Leuchtenburg
United States
Cambridge
Massachusetts
flag msg tools
badge
Avatar
mbmbmbmbmb
Re: Rating with confidence - going TOTALLY NUTS wth statisti
vernicus wrote:
This is no different than having a hot new Euro explode onto the rankings but drop over time, no?
This is actually something which the rankings have been tweaked to prevent. The idea is that having stable rankings is desirable.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Marshall P.
United States
Wichita
Kansas
flag msg tools
"Nothing in Biology Makes Sense Except in the Light of Evolution" - Theodosius Dobzhansky
badge
There is grandeur in this view of life, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved.
Avatar
mbmbmbmbmb
jgrundy,

15 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Ray
United States
Carpentersville
Illinois
flag msg tools
badge
Avatar
mbmbmbmbmb
Re: Rating with confidence - going TOTALLY NUTS wth statisti
I love the charts more than thumbs can measure and they definitely drive home what is happening with the numbers on the geek. Following through on the scientific practice of debating the straw-man, I wouldn't mind seeing some better research as to the cause.

Quote:
btw on reflection it makes perfect sense that on average users who rate more games have lower ratings averages. A player with only a few games under their belt is likely to have mostly been exposed to "quality" games, whereas to get to rate hundreds of games you have to reach a little deeper.

I don't buy that "a few games rated" has as its cause "a few games tried". There are many types of gamers (from Fanboys to flavor-of-the-week gamers) with personality types that don't take the time to enter anything other than their favorite game and yet have played much. Perhaps we need to study the type of game that gets highly rated? Perhaps we need to interview the type of gamer that rates so few (for instance I found http://www.boardgamegeek.com/thread/51948 quite insightful)

A stat I would be interested in seeing is 'single vote spreading'. If each user gets one vote that gets evenly spread between all the games they rate (e.g. a user with 100 ratings would score .01 votes for each game rated) then what is the new top game list calculated from such a spread? It would emphasis the single voter over the mass voter. It would even withstand Bayesian votes being added to adjust for the most popular games.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Vernon Harmon
United States
Fairport
New York
flag msg tools
mbmbmbmb
dyfrgi wrote:
This is actually something which the rankings have been tweaked to prevent. The idea is that having stable rankings is desirable.
"Stable" as in "don't fluctuate wildly" not "stable" as in "carved in stone."

At any rate, a niche game wouldn't have enough rankings to qualify in the current system, so we'd already be talking about some kind of change in the system, right? I was just making a point about the apparent attitudinal bias.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Vernon Harmon
United States
Fairport
New York
flag msg tools
mbmbmbmb
wtrollkin2000 wrote:
I don't buy that "a few games rated" has as its cause "a few games tried".
No, not across the board, but a newer gamer, or even a new BGG user, will begin by rating games that they are/have played. If they are still in the hobby, then they have OBVIOUSLY played some good games (or at least good enough to hook them), so why wouldn't you expect them to have higher averages? In fact, I'd be surprised to see anyone rate a small number of games and have a lower-than-expected average rating. Why would such a person still be playing games??

I have played a fair number of games, but I have only rated (mostly) what is in my collection because that is what I have played since I began rating games here. I don't trust myself to accurately rate a game I played 3-10 years ago. The games I own are going to tend to be games I like, and will hence rate highly. There's nothing under-handed about it.

wtrollkin2000 wrote:
A stat I would be interested in seeing is 'single vote spreading'. If each user gets one vote that gets evenly spread between all the games they rate (e.g. a user with 100 ratings would score .01 votes for each game rated) then what is the new top game list calculated from such a spread? It would emphasis the single voter over the mass voter. It would even withstand Bayesian votes being added to adjust for the most popular games.
Just to be clear, you're not advocating this as a valid rating system, merely saying that you'd like to see the results so that you could see what kind of skew would result, right?
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Ray
United States
Carpentersville
Illinois
flag msg tools
badge
Avatar
mbmbmbmbmb
Re: Rating with confidence - going TOTALLY NUTS wth statisti
vernicus wrote:
wtrollkin2000 wrote:
A stat I would be interested in seeing is 'single vote spreading'. If each user gets one vote that gets evenly spread between all the games they rate (e.g. a user with 100 ratings would score .01 votes for each game rated) then what is the new top game list calculated from such a spread? It would emphasis the single voter over the mass voter. It would even withstand Bayesian votes being added to adjust for the most popular games.
Just to be clear, you're not advocating this as a valid rating system, merely saying that you'd like to see the results so that you could see what kind of skew would result, right?
exactly. As the article points out:

Quote:
I thought to show you a chart of this effect, but it turns out it's only small.

we really can't see this data since by its nature single voters are so small. As we all point out so far we have all been making conjecture as to why and the only way to really understand this is to actually study the data and not mask it as uninformed outliers.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Joe Grundy
Australia
Sydney
NSW
flag msg tools
badge
Avatar
Microbadge: Pregnant - baby on boardMicrobadge: It's a Boy!!Microbadge: Pregnant - baby on boardMicrobadge: Microbadge: Offline from The Geek for a while
Hi Vernon, thanks for the thoughtful post.

vernicus wrote:
jgrundy wrote:
If someone rates four games (A,B,C,D) they're making fuzzy individual statements about each game, but very firm comparative statements about whether A is better than C or D is better than A.
This is not true, even if someone is consistent with their ratings and is consistent with the BGG rating guidelines. It goes a bit to the crux of snicholson's dilemma of quality vs. replayability. Beyond that, however, you have made a plea to "rate each game independently" which by extension means it is NOT a comparative statement.

For example, if I am rating a light filler game, I am likely to rate it independently on a different "feel" than I will rate a heavy-weight Euro.
If a user is rating each game individually and independantly, then whatever the "measure" is that their guts are using to rate the games is "comparable" across genres. If they're grouping games together in their head, such as "this is a filler so I'll rate it compared to other fillers" then their assessment is no longer independent of other games.

Users don't need to separate game subcategories on behalf of the system. The "advanced search" is an easy way to then find the top ranked fillers, or the top ranked war games etc. etc. (However the one-click list of ranked games could use a one-click system to filter for some major subcategories, just to make this clearer.)

On this point I have only one personal dilemna... children's games. The "problem" of children's games being that we ought to be able to see what the children think, BUT most children's games are (a) played by adults with the children and (b) only rated by those adults. This is different from every other specialist group, where the games are played and rated primarily by their target audience (plus a minority of ring-ins).

vernicus wrote:
However, if, as snicholson is now doing, you rate more based on replayability, the only comparative statement you are making is "I am more likely to play A than B."
Precisely. Which would make the rankings "the most replayable games" list. Not a bad thing when that's the specified rating metric. Bear in mind, the rankings can only rank on a single metric. What metric would we choose? This is a subjective decision we could debate 'til the cows retire. Like this...

vernicus wrote:
Oh, one last thing: why would it be such an awful thing to have "niche" games float higher in the rankings? I would assert that it would actually be a Good Thing to increase visibility of games that others might like but never ever know about. In addition, time would provide a balance: as these "niche" games spent time high on the rankings, more people would be inclined to try them out and subsequently rate them, thereby establishing them "correctly" within the rating hierarchy
It depends what we want the rankings to achieve. New users to BGG are more likely (not everyone of course) just to start at #1 and work their way down until they hit some snag. (I did this. I bought #1 and hit a snag.) Then they realise they need to dig a bit deeper.

For the sake of the biggest consumer group of rankings, I like the idea that the rankings would show the games most likely for a random buyer to like. In this respect, the current system does reasonably well but almost I'm inclined to suggest the "total goodwill" metric does better, even if it's "less interesting" to us to see Carc, TtR, and Settler's right back up near the top of the rankings.
http://www.boardgamegeek.com/geeklist/20420

Conversely, and here's what niggles for me in your suggestion, if the ratings system pushes niche games to the top and therefore more people try them and thereby the games get to find their "true" position as the people trying them don't like them... then the rankings are relying on getting people to spend time and money they didn't really want to. I'd rather "find the true position" from as little data as required in the first place, and have a ranking system that if used in isolation causes as little grief as possible.

If you have a known niche interest, it isn't too hard to isolate games you are more likely to be interested in. If you don't have a niche interest, such games are "in the way" if they're too visible. By definition, most of the rankings consumers don't share any specific niche interest.
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Joe Grundy
Australia
Sydney
NSW
flag msg tools
badge
Avatar
Microbadge: Pregnant - baby on boardMicrobadge: It's a Boy!!Microbadge: Pregnant - baby on boardMicrobadge: Microbadge: Offline from The Geek for a while
Ray, thanks for joining the discussion.

wtrollkin2000 wrote:
I don't buy that "a few games rated" has as its cause "a few games tried". There are many types of gamers (from Fanboys to flavor-of-the-week gamers) with personality types that don't take the time to enter anything other than their favorite game and yet have played much.
Precisely. "A few games rated" is likely to indicate you are rating a higher quality of game, either by exposure or by self selection of what you bother to rate. As I say in the main article, I'm now more willing to believe that most of the one-rating-wonders are legit stand alone ratings.

wtrollkin2000 wrote:
A stat I would be interested in seeing is 'single vote spreading'. If each user gets one vote that gets evenly spread between all the games they rate
I did this exercise a while back but I've lost it. Hang on...

(type type type. click click. type type swear type type. click click.)

There's a furfy in here... picking parameters for a new Bayesian adjustment since now there's 25,000 "votes" instead of a million. I picked 30x5.5
Here's the top 20:
Puerto Rico
Caylus
Catan
Tigris & Euphrates
Go
War of the Ring
Twilight Imperium (Third Edition)
BattleLore
Power Grid
Carcassonne
A Game of Thrones
Memoir '44
Ticket to Ride
Heroscape Master Set: Rise of the Valkyrie
The Princes of Florence
El Grande
Arkham Horror
Ticket to Ride: Europe
Diplomacy
Railways of the World

Of course I could instead just filter for users with five or less ratings. (And add 30x5.5) Now there's only 18000 votes:
Puerto Rico
Caylus
Catan
War of the Ring
Tigris & Euphrates
Twilight Imperium (Third Edition)
Go
Carcassonne
A Game of Thrones
Ticket to Ride
BattleLore
Memoir '44
Heroscape Master Set: Rise of the Valkyrie
Arkham Horror
Power Grid
Advanced Squad Leader
Diplomacy
El Grande
The Princes of Florence
Titan
6 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Joe Grundy
Australia
Sydney
NSW
flag msg tools
badge
Avatar
Microbadge: Pregnant - baby on boardMicrobadge: It's a Boy!!Microbadge: Pregnant - baby on boardMicrobadge: Microbadge: Offline from The Geek for a while
mdp4828 wrote:
jgrundy,

(bunny with pancake picture)
Thanks.
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Brandon Clarke
New Zealand
Auckland
Auckland
flag msg tools
badge
Avatar
mbmbmbmbmb
Re: Rating with confidence - going TOTALLY NUTS wth statisti
First of all, you're definitely quite mad.

I don't mean that in a critical way at all. In fact it brings joy to my life to realise that there actually ARE people out there more anal and more obsessive than I am. Makes me feel rather normal afterall.

Superb thread. Well, threads, actually, what with all the links to other similarly tragically well thought out and researched geeklists et al. Given it was a such an overboardly dry subject - one that ought to see you nominated for king geek of the year (and that's a good thing, in my mind, not a criticism) I really enjoyed it.

BC
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Chris R.
United States
Unspecified
Missouri
flag msg tools
badge
Avatar
Microbadge: I love DragonsMicrobadge: An African or European swallow?Microbadge: Titan fanMicrobadge: Red Dwarf fanMicrobadge: David A. Trampier fan
First, I think I'll most definately second the bunny with pancake thought.

However, there are a few things that I've never understood about the rating system. First, I've always wondered why so few people rate games using decimal points to indicate tenths and hundredths between the integers. Not doing this would seem to conflict with specifying a Top Ten list. Otherwise shouldn't your Top Ten list be exactly ten games ranked 10, 9, 8, 7, 6, 5, 4, 3, 2, 1? If both your #6 and #7 games are both ranked a 9, please explain how one can be ranked higher on the list unless you simply use an arbitrary method such as alphabetizing to break the ties. Shouldn't more exact ratings be encouraged? I didn't know that there were special guidelines for determining "MY" ratings until after I had already rated my games by essentially duplicating information that I used for something else. Under the aforementioned guidelines, a 9 or 10 rated game means that you ALWAYS want to play the game. The word "always" seems a bit drastic. One "problem" seems to be some people don't adjust their ratings downward when the "hot games" cool down. It seems to me that some games that are quite unique deserve to be rated somewhat highly rather you like them or not, and it seems to me that it can be difficult to determine if such unique games are actually quite brilliant or have some broken feature until you've actually played them several times. Apparently, uniqueness as a rating feature is not as appreciated by others. Why else would the number #5 game BattleLore, a repackaging of Command and Colors which is a repackaging of multiple Memoir '44 stuff which is a repackaging Battle Cry, be ranked so highly? I've never thought that many older games have ever gotten a fair rating here. That's only my opinion. I tried to correct this with my own geeklist http://www.boardgamegeek.com/geeklist/13253. However, the numbers started to get a bit "weird" with the older games, and as I was told the creation date for some older games could be off by several decades or even centuries, depending what rules are used.

Think how much differently the Top 20 games would look, if you only rated games that had been around for more than 10 years.

1. El Grande
2. Die Macher
3. Hannibal: Rome vs. Carthage
4. Go
5. Settlers of Catan
6. Modern Art
7. Up Front
8. Tichu
9. Crokinole
10. Advanced Squad Leader (ASL)
11. Dune
12. Acquire
13. Civilization
14. RoboRally
15. 1830
16. EastFront
17. Republic of Rome
18. Space Hulk
19. Medici
20. Blood Bowl - Third Edition

As someone who doesn't like the terms "meeple" and "Ameritrash," I think this is what is been behind some of the recent debate between styles of board games. I think part of it has to do with disrespecting one's elders -- the games that is. Didn't Isaac Newton say that, "If I have seen further it is by standing upon the shoulders of giants." (Actually, I thought it was Galileo.) Many hall of fame selection committes state that a person has to be retired for five years before selection. The Rock and Roll Hall of Fame even requires a 25-year cooling off period. Would a (fill-in-the-blank)-year delay past date of publication for a game to be considered a "true" top game be more biased than a Bayesian probability system which appears to be more of a "hot games" categorization method? I think that would probably depend upon what games are your personal favorites.
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Philip Thomas
United Kingdom
London
London
flag msg tools
badge
Avatar
mbmbmbmbmb
Quote:
First, I've always wondered why so few people rate games using decimal points to indicate tenths and hundredths between the integers. Not doing this would seem to conflict with specifying a Top Ten list. Otherwise shouldn't your Top Ten list be exactly ten games ranked 10, 9, 8, 7, 6, 5, 4, 3, 2, 1? If both your #6 and #7 games are both ranked a 9, please explain how one can be ranked higher on the list unless you simply use an arbitrary method such as alphabetizing to break the ties. Shouldn't more exact ratings be encouraged?
Ok. There is really no contradiction between having a top ten list and having a series of grade boundaries which is not infinite. My top 10 consists of games all of which are rated 9 (or 8 at the bottom, I forget). Why? Because they meet the criteria for a '9'. Nobody said that two games that meet the same criteria have to be exactly as good. In the Academic world, unversity degrees are given out as Firsts, Upper Seconds, Lower Seconds, Thirds, Passs, Fail. Are you going to tell the examiner that just because she gave two scripts a First that must mean they get identical ranking? Of course not, many universities also display the ranking for the top 100 Firsts, or whatever. They are all Firsts but some of them are better than others.

Of course, with academic scripts there is a precise underlying data set to fall back on, the marks. Games are more subjective. But when I rate a game I'm giving it a grade not an exact quantification. This has several advantages- I don't have to constantly change the ratings to cater for minor fluctuations in preference, and I'm using criteria that are publicly available. I think the system works if people do that and more exact rating is allowed to emerge from averages etc. If everybody used a thousand point scale there would be chaos- what would a given rating mean?

 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Arcadian Del Sol
United States
Unspecified
Unspecified
flag msg tools
Avatar
mbmbmbmbmb
After breezing casually through the original post, I *think* we're supposed to put in a sell order?
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Russell Webb
New Zealand
Christchurch
flag msg tools
badge
Avatar
mbmbmbmbmb
Excellent!

I'd love to see the average rating vs. number of games rated, k, plotted with the average of the top k games raw rating. If you played only the top k games and had exactly the average impression of them presumably your average would come out the same.

I suspect the average of the top k will be much higher, so the obvious follow-up is: if each player plays a completely random selection of the top M games, what M gives and average that matches the average rating of people who rate k games?

Finally, why don't we have a NetFlix style analyser to suggest games based on our rating compared to other people's preferences. A "You might also enjoy" list. Maybe we have that and I just don't see where (seem like an obviously useful feature).

Again, great work!
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Joe Grundy
Australia
Sydney
NSW
flag msg tools
badge
Avatar
Microbadge: Pregnant - baby on boardMicrobadge: It's a Boy!!Microbadge: Pregnant - baby on boardMicrobadge: Microbadge: Offline from The Geek for a while
rwebb wrote:
Excellent!
Thanks!

Russell, this is the number of games rated on the X axis and the top M games to get the same average on the Y axis. Two lines are shown...
Pink: where the average of the raw averages (sic) from ranks 1 to Y equals the average rating given by people who rated X games.
Blue: where the raw averages at ranks Y equals the average rating given by people who rated X games.

When counting how many games people rated for this exercise, I only counted ranked games. ie no expansions and at least 30 ratings.

Note that the number of data points falls off above a few hundred rated games, and the number of ranked games sets a maximum limit on the pink points on the right.



Interesting. The first few points (up to ten rated games) are linear. Then the rest are logarithmic.

Cheers
Joe

Edit: Uploaded the chart into my BGG personal gallery. thumbsup
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Pee di Moor
Netherlands
Rotterdam
flag msg tools
badge
Avatar
mbmbmbmbmb
Re: Rating with confidence - going TOTALLY NUTS wth statisti
Really enjoyed reading it.

I'm really curious whether you consider (or have considered) a place for Cohens Kappa (http://en.wikipedia.org/wiki/Cohens_kappa) or Fleiss' Kappa in the ratings.

 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Joe Grundy
Australia
Sydney
NSW
flag msg tools
badge
Avatar
Microbadge: Pregnant - baby on boardMicrobadge: It's a Boy!!Microbadge: Pregnant - baby on boardMicrobadge: Microbadge: Offline from The Geek for a while
Thanks for that link I hadn't seen that calculation, and no I hadn't previously considered it. It's an interesting technique but at first read I don't think it has a place here.

It appears it would be appropriate to use it if we feel: if some people said a game was "6" and others said it was "7" then they are disagreeing just as much as if some people said it was "2" and others said it was "9".

EDIT: I may be able to adjust the technique to consider this.

Also, 11% of ratings are fractional. 2.4% are frational and not "X.5". Although we could just round these without materially affecting the ranking results.
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Vernon Harmon
United States
Fairport
New York
flag msg tools
mbmbmbmb
Quote:
Also, 11% of ratings are fractional. 2.4% are frational and not "X.5". Although we could just round these without materially affecting the ranking results.
Ugh! It already irks me that the system allows fractional ratings but only shows discrete rankings in the user's rating distribution. If it allows me to rate something 8.5 -- better than an 8, but not quite a 9 -- then it shouldn't obscure that fact by forcing it to be a 9 in my distribution. The rating is what it is for a reason and "adjusting" it in any way invalidates the user's rating(s). If the system is only supposed to support integer ratings, then it's a pretty simple thing to force compliance on (unless there's some funky backward-compliance concern I'm not aware of -- was the rating system changed from reals to integers at some point in the past?).

 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Joe Grundy
Australia
Sydney
NSW
flag msg tools
badge
Avatar
Microbadge: Pregnant - baby on boardMicrobadge: It's a Boy!!Microbadge: Pregnant - baby on boardMicrobadge: Microbadge: Offline from The Geek for a while
vernicus wrote:
If it allows me to rate something 8.5 -- better than an 8, but not quite a 9 -- then it shouldn't obscure that fact by forcing it to be a 9 in my distribution. The rating is what it is for a reason and "adjusting" it in any way...
Having a histogram is a convenient way to display things, but requires you set up "buckets" or "pigeon holes". (Noting especially that even people who use fractions use many more integers than fractions in their ratings.)

You fractions are still wound into the averages and rankings. It's just difficult to have a meaningful and simple graphical representation of your distribution curve which includes every fractional vote at it's precise value.
1 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Vernon Harmon
United States
Fairport
New York
flag msg tools
mbmbmbmb
Yeah, I get the histogram justification, but it doesn't seem like it would be that difficult to just show a bar for every distinct rating entry. But then again, I only use halves. If someone tried to create distinct rankings with their ratings (going back to your comparative statement analysis -- see how I tie it all together?) by going so far as rating 8.01, 8.02, etc. that could be a bit taxing as a histogram, eh?

So do you know about my other question? Have the BGG ratings always been integral but allowing reals?
 
 Thumb up
 tip
 Hide
  • [+] Dice rolls
Prev «  1 , 2 , 3 , 4 , 5  Next »   |