Results 1 to 13 of 13

Thread: Statistics help

  1. - Top - End - #1
    Surgebinder in the Playground Moderator
     
    Douglas's Avatar

    Join Date
    Aug 2005
    Location
    Mountain View, CA
    Gender
    Male

    Default Statistics help

    Any expert statisticians in the Playground? I'm trying to do something that goes a bit beyond what my high school statistics class covered, and I want a second opinion on whether I got it right. Details here.
    Like 4X (aka Civilization-like) gaming? Know programming? Interested in game development? Take a look.

    Avatar by Ceika.

    Archives:
    Spoiler
    Show
    Saberhagen's Twelve Swords, some homebrew artifacts for 3.5 (please comment)
    Isstinen Tonche for ECL 74 playtesting.
    Team Solars: Powergaming beyond your wildest imagining, without infinite loops or epic. Yes, the DM asked for it.
    Arcane Swordsage: Making it actually work (homebrew)

  2. - Top - End - #2
    Titan in the Playground
     
    Brother Oni's Avatar

    Join Date
    Nov 2007
    Location
    Cippa's River Meadow
    Gender
    Male

    Default Re: Statistics help

    Quote Originally Posted by Douglas View Post
    Any expert statisticians in the Playground? I'm trying to do something that goes a bit beyond what my high school statistics class covered, and I want a second opinion on whether I got it right. Details here.
    warty goblin is the main professional statistician (and expert woodworker) I know of on this forum. Maybe you can use your mod powers to summon him?

  3. - Top - End - #3
    Titan in the Playground
     
    Kato's Avatar

    Join Date
    Apr 2008
    Location
    Germany
    Gender
    Male

    Default Re: Statistics help

    My statistics is not good enough (anymore) to reliably make a comment on the procedure (though I see no obvious problem).
    But I wonder what you mean by 'first' and 'last' cards in the deck. The order in which you constructed your deck? Or the order Arena displays the cards when looking at it? I feel unless it was a weird intended bug it's unlikely the client remembers in what order you constructed the deck.
    "What's done is done."

    Pony Avatar thanks to Elemental

  4. - Top - End - #4
    Titan in the Playground
    Join Date
    May 2007
    Location
    Tail of the Bellcurve
    Gender
    Male

    Default Re: Statistics help

    I'm sure just glancing at it whether your method will work or not. I'll have a harder look this weekend; right now I have to finish my cornflakes so I can head back to the statistics mines...

    Quote Originally Posted by Brother Oni View Post
    warty goblin is the main professional statistician (and expert woodworker) I know of on this forum. Maybe you can use your mod powers to summon him?
    Thanks, but I'm hardly an expert woodworker. If I was, I'd have figured out how to carve faces by now.
    Blood-red were his spurs i' the golden noon; wine-red was his velvet coat,
    When they shot him down on the highway,
    Down like a dog on the highway,
    And he lay in his blood on the highway, with the bunch of lace at his throat.


    Alfred Noyes, The Highwayman, 1906.

  5. - Top - End - #5
    Barbarian in the Playground
     
    NecromancerGuy

    Join Date
    Apr 2010
    Location
    Night Vale
    Gender
    Male

    Default Re: Statistics help

    My gut says you should be testing to see if you can reject the null hypothesis of shuffling is fair and random first.

    I may be mistaken, but I think a multivariate binomial distribution is more appropriate for what you're doing, or the small population expansion.
    Avatar by TheGiant
    Long-form Sig

  6. - Top - End - #6
    Troll in the Playground
     
    BardGuy

    Join Date
    Jan 2009

    Default Re: Statistics help

    This sounds awesome.

    I'm doing a Master's in statistics currently, so I'll try commenting on a couple things. I definitely defer to the more experienced, though.

    I think you are going about it in the right way, if I'm reading you right.

    Quote Originally Posted by from your link
    My plan:

    1. Run my implementation of the bug one billion times, recording frequency distributions for the first and last 24 cards before shuffling showing up in the first 7 after shuffling.
    2. Use a chi-square two sample test to compute two p-values - one for the first 24 cards in the deck, the other for the last 24, in both cases comparing data from the game vs data from my simulation. As I understand it, I need the two sample variation rather than the more common Pearson's version because my predicted distribution is itself generated by a random sample rather than derived theoretically.
    3. Use Fisher's method to combine these p-values into one.
    4. Compare the result with the chosen significance level of 0.05.
    In step #1, you are simulating what you expect is happening during shuffling. That is, a bug is making it non-random.
    You use these simulations to make up two distributions: for the first 24 cards, and for the last 24 cards.

    In step #2, you run two tests to see if "first 24 real" is similar to "first 24 simulated", and same for last 24. I'd defer to warty goblin about the correctness of steps 2-4, but that sounds reasonable.

    Note that, instead of using a p-value of 0.05 as the straight cut-off, you could take the attitude of degrees of evidence. < 0.05 is "strong evidence"; < 0.01 is "very strong evidence".
    If you plan on presenting this to Wizards of the Coast, they might find it intriguing even if you have a p-value < .1.

    As a side note, if you wanted to simply prove non-randomness, I'd recommend simulating truly random (well, computer pseudo-random RNG) shuffling and comparing its distribution to your real data, to see if there's a difference.
    But here you're trying to prove a specific sort of non-randomness.
    On the other hand, it might be worthwhile for you to test just if the shuffling is non-random in general as well as for what you expect is the particular bug.

    I have 8 bins and unequal sample sizes, so 8 degrees of freedom.
    Usually, the degrees of freedom are the number of bins - 1. (It can be less if you have to estimate any of your parameters.)

    Quote Originally Posted by Astral Avenger View Post
    My gut says you should be testing to see if you can reject the null hypothesis of shuffling is fair and random first.

    I may be mistaken, but I think a multivariate binomial distribution is more appropriate for what you're doing, or the small population expansion.
    I'd agree with his gut response. At least, that would make folk take the "now I'm testing for this specific sort of non-randomness" more seriously, since they'd already be fairly convinced something is off.

    Hypergeometric (or some sort of multivariate hypergeometric?) might also be more appropriate, although for large samples the difference between it and binomial mostly disappear.
    Last edited by JeenLeen; 2019-04-05 at 08:03 AM.

  7. - Top - End - #7
    Surgebinder in the Playground Moderator
     
    Douglas's Avatar

    Join Date
    Aug 2005
    Location
    Mountain View, CA
    Gender
    Male

    Default Re: Statistics help

    Quote Originally Posted by Kato View Post
    My statistics is not good enough (anymore) to reliably make a comment on the procedure (though I see no obvious problem).
    But I wonder what you mean by 'first' and 'last' cards in the deck. The order in which you constructed your deck? Or the order Arena displays the cards when looking at it? I feel unless it was a weird intended bug it's unlikely the client remembers in what order you constructed the deck.
    You can export a deck from Arena, which works by putting a list of the cards in it into your clipboard so you can paste it somewhere. The order of that list is always the same, and is determined by the order you added cards to the deck when you built it. When a deck is written to the game logs, the same order is used. I'm pretty sure this is the order that the game uses internally to store the deck, and most likely is also the order that gets input to the shuffler.

    Quote Originally Posted by Astral Avenger View Post
    My gut says you should be testing to see if you can reject the null hypothesis of shuffling is fair and random first.
    Already did that.

    Quote Originally Posted by Astral Avenger View Post
    I may be mistaken, but I think a multivariate binomial distribution is more appropriate for what you're doing, or the small population expansion.
    On looking it up, it looks like that is indeed the type of distribution I'm working with, but I didn't find anything on how to test whether two samples are from the same such distribution.

    Quote Originally Posted by JeenLeen View Post
    Note that, instead of using a p-value of 0.05 as the straight cut-off, you could take the attitude of degrees of evidence. < 0.05 is "strong evidence"; < 0.01 is "very strong evidence".
    If you plan on presenting this to Wizards of the Coast, they might find it intriguing even if you have a p-value < .1.
    A low p-value would actually be an indication that I'm wrong, not that I'm right.

    Quote Originally Posted by JeenLeen View Post
    As a side note, if you wanted to simply prove non-randomness, I'd recommend simulating truly random (well, computer pseudo-random RNG) shuffling and comparing its distribution to your real data, to see if there's a difference.
    But here you're trying to prove a specific sort of non-randomness.
    On the other hand, it might be worthwhile for you to test just if the shuffling is non-random in general as well as for what you expect is the particular bug.
    Already did that, as linked above. The distribution a correct shuffler is supposed to have is easy to derive from pure theory, and that's the first thing I tested. My choice of hypothesis for the specific bug is based on the patterns I observed in that first test.

    Quote Originally Posted by JeenLeen View Post
    Usually, the degrees of freedom are the number of bins - 1. (It can be less if you have to estimate any of your parameters.)
    According to the source I'm using for how to do the two sample chi-squared test, if the sample sizes are different then it's just the number of bins. If you have a more authoritative source that says otherwise, please tell me where I can find it. When I try searching for such things, the overwhelming majority of results are about a regular Pearson's chi-squared test, which has an assumption that doesn't match my situation - that the predicted distribution is a theoretical one, known exactly with no variance.

    Quote Originally Posted by JeenLeen View Post
    Hypergeometric (or some sort of multivariate hypergeometric?) might also be more appropriate, although for large samples the difference between it and binomial mostly disappear.
    Hypergeometric is what a correct shuffle is supposed to have. This bug results in something different.

    Thanks for the comments! I'm hoping to post my detailed study plan on reddit today, but I want to be reasonably sure I'm doing the analysis right first.

    Incidentally, the numbers for the simulation row in my example are real. Running the billion shuffles and tabulating the results took somewhere around an hour, I think.
    Last edited by Douglas; 2019-04-05 at 01:31 PM.
    Like 4X (aka Civilization-like) gaming? Know programming? Interested in game development? Take a look.

    Avatar by Ceika.

    Archives:
    Spoiler
    Show
    Saberhagen's Twelve Swords, some homebrew artifacts for 3.5 (please comment)
    Isstinen Tonche for ECL 74 playtesting.
    Team Solars: Powergaming beyond your wildest imagining, without infinite loops or epic. Yes, the DM asked for it.
    Arcane Swordsage: Making it actually work (homebrew)

  8. - Top - End - #8
    Titan in the Playground
    Join Date
    May 2007
    Location
    Tail of the Bellcurve
    Gender
    Male

    Default Re: Statistics help

    You're using the two sample chi square correctly so far as I can tell. However, combining the two p-values via Fisher's Method isn't quite right here, because the tests aren't independent. Since all the cards have to end up somewhere in the deck, the location of the first card in the shuffled deck is not independent of the location of the second.

    The most obvious solution is to do one test from the beginning by working directly with the simulated joint distribution. So when you simulate/tabulate the data, record the probabilities for all 60 cards ending up in your hand, then calculate the marginal distribution of cards from the first 24 and last 24 by summing over that. This directly gets you a Monte Carlo approximation to the appropriate distribution, so you can calculate a single p-value via the chi square test.


    Because the deck is fairly large however, the dependence between the cards will be fairly weak, so this won't change your answer very much. Further, because the dependence is by necessity negative, your current method is very slightly conservative, which, if you're going to be wrong, is the direction to be wrong in.
    Blood-red were his spurs i' the golden noon; wine-red was his velvet coat,
    When they shot him down on the highway,
    Down like a dog on the highway,
    And he lay in his blood on the highway, with the bunch of lace at his throat.


    Alfred Noyes, The Highwayman, 1906.

  9. - Top - End - #9
    Surgebinder in the Playground Moderator
     
    Douglas's Avatar

    Join Date
    Aug 2005
    Location
    Mountain View, CA
    Gender
    Male

    Default Re: Statistics help

    How is the non-independence of positions of different cards relevant to my use of Fisher's method? That non-independence affects the results for how many early/late cards are in the opening hand, which is what goes into the two sample chi square test. Fisher's method doesn't come in until that's already done.
    Like 4X (aka Civilization-like) gaming? Know programming? Interested in game development? Take a look.

    Avatar by Ceika.

    Archives:
    Spoiler
    Show
    Saberhagen's Twelve Swords, some homebrew artifacts for 3.5 (please comment)
    Isstinen Tonche for ECL 74 playtesting.
    Team Solars: Powergaming beyond your wildest imagining, without infinite loops or epic. Yes, the DM asked for it.
    Arcane Swordsage: Making it actually work (homebrew)

  10. - Top - End - #10
    Titan in the Playground
     
    Kato's Avatar

    Join Date
    Apr 2008
    Location
    Germany
    Gender
    Male

    Default Re: Statistics help

    Huh, I never felt like MGA's shuffling was eskew... but then again I'm not too attentive or whatever counts here. If it was really bad I guess I would have noticed.

    So, I know I cannot really be of much help but if I may ask anyway...(assuming this isn't just an exercise to improve your statistics background)
    what do you think the bug is and how much does it diverge from the 'proper' distribution? (going by your simulation)
    "What's done is done."

    Pony Avatar thanks to Elemental

  11. - Top - End - #11
    Titan in the Playground
    Join Date
    May 2007
    Location
    Tail of the Bellcurve
    Gender
    Male

    Default Re: Statistics help

    Quote Originally Posted by Douglas View Post
    How is the non-independence of positions of different cards relevant to my use of Fisher's method? That non-independence affects the results for how many early/late cards are in the opening hand, which is what goes into the two sample chi square test. Fisher's method doesn't come in until that's already done.
    Fisher's method combines p-values from independent tests. Your tests aren't independent because you're p-values are derived from tests of dependent random variables. If A is the number of first 24 cards in your hand, and B is the number of the last 24, then you necessarily have that A + B <= 7.
    Blood-red were his spurs i' the golden noon; wine-red was his velvet coat,
    When they shot him down on the highway,
    Down like a dog on the highway,
    And he lay in his blood on the highway, with the bunch of lace at his throat.


    Alfred Noyes, The Highwayman, 1906.

  12. - Top - End - #12
    Surgebinder in the Playground Moderator
     
    Douglas's Avatar

    Join Date
    Aug 2005
    Location
    Mountain View, CA
    Gender
    Male

    Default Re: Statistics help

    Quote Originally Posted by Kato View Post
    Huh, I never felt like MGA's shuffling was eskew... but then again I'm not too attentive or whatever counts here. If it was really bad I guess I would have noticed.

    So, I know I cannot really be of much help but if I may ask anyway...(assuming this isn't just an exercise to improve your statistics background)
    what do you think the bug is and how much does it diverge from the 'proper' distribution? (going by your simulation)
    The correct way to do a Fisher-Yates shuffle (which is what lead developer Chris Clay has said they're using) goes like this:
    Code:
    for (int i = 0; i < deck.length; i++) {
        int swapIndex = random.nextInt(deck.length - i) + i;
        int temp = deck[i];
        deck[i] = deck[swapIndex];
        deck[swapIndex] = temp;
    }
    I think the bug is that Arena is actually doing this:
    Code:
    for (int i = 0; i < deck.length; i++) {
        int swapIndex = random.nextInt(deck.length);
        int temp = deck[i];
        deck[i] = deck[swapIndex];
        deck[swapIndex] = temp;
    }
    For a 60 card deck, counting how many of the first 24 cards in the decklist get drawn and how many of the last 24, the distributions from my simulation look like this. Each value is the probability of drawing that many of those cards in the opening 7 card hand.
    0 in hand 1 in hand 2 in hand 3 in hand 4 in hand 5 in hand 6 in hand 7 in hand
    first 24 0.009336 0.068686 0.201692 0.306143 0.259227 0.122308 0.029739 0.002869
    last 24 0.046986 0.194165 0.319792 0.271807 0.128615 0.033814 0.004575 0.000245

    Quote Originally Posted by warty goblin View Post
    Fisher's method combines p-values from independent tests. Your tests aren't independent because you're p-values are derived from tests of dependent random variables. If A is the number of first 24 cards in your hand, and B is the number of the last 24, then you necessarily have that A + B <= 7.
    I see. I expect the effect of that on the aggregate statistics is really tiny, especially over a large sample size, and weakened even further by the fact that some games will only be counted in one or the other because, for example, the 24th and 25th cards are the same (and I can't distinguish which copy got drawn).

    If I really want to be rigorous about this detail, it would be far simpler to separate games into two groups, and check only the first 24 cards for one group and the last 24 for the other. I actually did that for the simulation results, doing a separate set of 1 billion shuffles for each distribution.

    Sounds like you're saying this is ok to ignore because of how small it is and what direction it's in?
    Like 4X (aka Civilization-like) gaming? Know programming? Interested in game development? Take a look.

    Avatar by Ceika.

    Archives:
    Spoiler
    Show
    Saberhagen's Twelve Swords, some homebrew artifacts for 3.5 (please comment)
    Isstinen Tonche for ECL 74 playtesting.
    Team Solars: Powergaming beyond your wildest imagining, without infinite loops or epic. Yes, the DM asked for it.
    Arcane Swordsage: Making it actually work (homebrew)

  13. - Top - End - #13
    Titan in the Playground
     
    Kato's Avatar

    Join Date
    Apr 2008
    Location
    Germany
    Gender
    Male

    Default Re: Statistics help

    Wow, it took me way too long to code this (mostly because I made stupid mistakes not because I couldn't figure it out but still.. and I used octave) (Also, not sure if my implementation is the most efficient but it works)

    Okay, the difference seems really obvious now so if you have the raw data it should be clear if they use the wrong algorithm which would be a bit embarassing... Also, it seems weird MGA doesn't sort cards alphabetically or something. But apparently not. So let me know if I should abuse this bug in the future
    "What's done is done."

    Pony Avatar thanks to Elemental

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •