My first statistics textbook was a friend’s inexpertly photocopied “Dungeon Masters Guide,” by Gary Gygax (piracy took more effort back then). In those brutal days, you generated the attributes of your character by rolling three six-sided dice and adding them. I clearly recall the diagram illustrating the expected distribution:

While this is an admittedly flippant example, it illustrates a powerful tool for teaching and learning. This sketch indelibly seared itself onto my teenage brain forever. Never underestimate the motivation of gamers and gamblers!

In fact, the very field of statistics and probability theory was ploughed and sown from attempts to better understand games of chance[1]. A French writer, Antoine Gombaud, Chevalier de Méré[2], once asked: “is it worth betting even money on the chance of getting at least one double-six upon throwing a pair of dice 24 times?” This led to a series of lively arguments between the conspicuous mathematicians Blaise Pascal and Pierre de Fermat. These arguments became the formal foundations of probability theory.

The answer to de Méré’s question? Close… but no. It’s marginally (at 50.86%) better to bet the other way:

However, despite these frivolous foundations, there are quite serious applications. Mathematically speaking, rolling a die is remarkably similar to the act of, *e.g.*, getting into a car accident (it’s virtually a coin flip in some cities!) Insurance companies use the same family of curves in Figure 1 to calculate what is a ‘sure bet’ that your automobile will win a prize in the fender-bender lottery.

This is because, at its heart, insurance is a gamble. A calculated risk. And studies have consistently shown[3] that most of us do not harbor a great lust for uncertainty — choosing to *minimize *risk, rather than *optimize *for it. And this is part of the reason why most insurance companies are not going bust anytime soon.

**The Hypothesis Test**

Perhaps unsurprisingly, people’s tolerance to risk relates directly to how suspicious they are when presented with a fishy tale or a rigged game. After all, our ability to accurately assess uncertain danger has been ruthlessly honed by the process of evolution.

The finely tuned tension between ‘fortune favors the bold’ and ‘look before you leap’ means that we innately possess uncannily similar ranges and appetites for risk. This balancing act forms the heart of the rather formal sounding *hypothesis test*. Here, we ask “what is the likelihood that the story being presented is, within reasonable doubt, what we had expected?” We compare some statement or measure to our default — usually the ‘simplest’ — assumption (our *null hypothesis*).

Hypothesis tests are used anywhere a statement or assertion is to be checked statistically. They are great for quantitatively testing statements such as “will my pizza *really *be delivered in thirty minutes?” or “this diet helps you lose weight”. Your favorite websites almost certainly have made design choices based on A/B tests, which are specific applications of hypothesis tests.

An interesting point about the standard hypothesis test is that, ever since the great Sir Ronald Fisher first promoted the concept in 1925, the default threshold value for the likelihood we reject our null hypothesis — the statistical *significance level*— has been 5%[4].

In other words, we are generally willing to accept that two things are equal until the likelihood this occurred by pure luck is less than one in twenty. After which we can state, with an air of superiority, that “**these two things are significantly different.”**

If this level of significance of 5% seems arbitrary…

It is.

Yet…

When we teach our Data Science and Analytics course, we will gently probe our learners as to why this default significance level is 5%. Does this appear to be a reasonable value?

We then follow this up with the question: “if I flip a coin ten times, how many heads would it take before you would start to question that it’s a fair coin?” I ask you, gentle time-strapped reader, to briefly ponder this very question, as it may make the following discussion more fun.

Now there is always some population spread (along with a roughly 5% smart-alec factor). Yet, invariably, most people state that they would reject the notion that we were flipping a fair coin at eight or more heads (of ten flips). Is this close to what you would have said?

Now if you add the probabilities of obtaining eight or more heads, the chance of this occurring is **5.5%**. So, while the default 5% threshold is admittedly arbitrary, most people agree that it is very reasonable. The learners in our course always respond well to this game, proving once again that games are a great way to teach! Learning is hard. That doesn’t mean it can’t be fun too.

All the Python code to produce the figures you see here (and bonus material) is provided here:

#!/usr/bin/python # ########################################### # # File: games_statistics.py # Author: Ra Inta, for Big Head Analytics, LLC # Description: # Created: August 10, 2018 # Last Modified: August 10, 2018 # # MIT License (see blurb at the end) # Copyright (c) 2018, Ra Inta # ########################################### import random import matplotlib as mpl import matplotlib.pyplot as plt import seaborn as sns import numpy as np mpl.rcParams['font.family'] = 'TeX Gyre Pagella' # Get OK-looking fonts for plots ################################################## ## Character statistics (3d6): ################################################## # We could create the histogram of 3d6 rolls a priori. # The probability of getting any particular combination # (a, b, c) from a three dice roll is (1/6)^3 ~ 0.00463 # There are three ways to obtain a 4 (or 17): # (1, 1, 2) = (1, 2, 1) = (2, 1, 1) = 4, while there is only # one way to obtain a 3 (or 18). And so on. So we could take # every possible combination of a, b and c, tally these up # and multiply by the chance of getting any given combination. # However, this can get tricky when we add some of the arbitrary # constraints used in many games. It can be more fun to simulate # the dice rolls. # So let's create some dice! random.seed(42) def dn(n=1, d=6, b=0): """Simulates the roll of a d-sided die n times, with an optional additive factor, b. For example, you want to roll a 3d4 + 2: dn(3, 4, 2) """ sum = 0 for Idx in range(n): sum += random.randint(1, d) return sum + b def d20(n=1, advantage=0): """A specialized version of d20, taking account of (dis-)advantage; i.e. rolling two d20s and taking the maximum (minimum)""" if advantage == 0: return dn(n,20) elif advantage == 1: return max([dn(1,20), dn(1,20)]) elif advantage == -1: return min([dn(1,20), dn(1,20)]) # Simulate 100,000 rolls of 3d6 to get smooth distribution # (Because Variance ~ 1/sqrt(N)) char_stats = [dn(3,6) for x in range(100000)] fig, ax = plt.subplots() ax.set_title('Frequency distribution for 3d6', size=16) ax.set_xlabel('Character statistic', size=14) ax.set_ylabel('Relative frequency', size=14) sns.distplot(char_stats, bins=16, kde_kws={'bw' : 0.7}, hist_kws={'edgecolor' : 'black'}) plt.savefig("character_stats.png") plt.clf() # One modern way is to roll 4d6, dropping the lowest die: def d6DropLowest(): """docstring for d6DropLowest""" d6_array = [dn(1,6) for x in range(4)] return sum(d6_array) - min(d6_array) # Let's compare the distributions: drop_lowest = [d6DropLowest() for x in range(100000)] fig, ax = plt.subplots() ax.set_title('Comparison of distributions of 3d6 and 4d6 (drop lowest)', size=16) ax.set_xlabel('Character statistic', size=14) ax.set_ylabel('Relative frequency', size=14) sns.distplot(char_stats, bins=16, kde_kws={'bw' : 0.7}, hist_kws={'edgecolor' : 'black'}, label="3d6") sns.distplot(drop_lowest, bins=16, kde_kws={'bw' : 0.7}, hist_kws={'edgecolor' : 'black'}, color="red", label="4d6 (drop lowest)") plt.legend() plt.savefig("comparison_3d6_to_4d6DropLowest.png") plt.clf() ################################################## ## Advantage on d20 rolls ################### ################################################## # Simulate 100,000 rolls each of d20s with (dis-) advantage: d20_disadv = [d20(1,-1) for x in range(100000)] d20_noadv = [d20(1,0) for x in range(100000)] d20_adv = [d20(1,1) for x in range(100000)] plt.hist(d20_disadv, bins=20, color="red", edgecolor='black', density=True, alpha=0.5, label="Disadvantage") plt.hist(d20_noadv, bins=20, color="blue", edgecolor='black', density=True, alpha=0.5, label="No advantage") plt.hist(d20_adv, bins=20, color="green", edgecolor='black', density=True, alpha=0.5, label="Advantage") plt.title('Effect of advantage on d20 rolls', size=16) plt.xlabel('Result', size=14) plt.ylabel('Relative frequency', size=14) plt.legend() ## Note the expected mean for no advantage is pretty close to 5%, so it's a fair d20! plt.savefig("advantage_d20.png") plt.clf() ################################################## # de Méré's game: Is it worth betting even money on the chance of getting at # least one double-six upon throwing a pair of dice 24 times? ################################################## # We can calculate this a priori: chance of getting double-six: p66 = 1/36. # Chance of _not_ getting one double-six after 24 rolls: # (1 - p66)^24 = 0.50860. # So it's close, but not worth it in the long run. # You're very slightly better off betting even money _against_ this outcome. def deMere(): """Simple function to simulate a single play of de Méré's game""" return sum([dn(2, 6) == 12 for x in range(24)]) deMere_array = np.zeros(100000) for Idx in range(len(deMere_array)): deMere_array[Idx] = deMere() fig, ax = plt.subplots() ax.set_title("Frequency distribution for de Méré's game", size=16) ax.set_xlabel('Number of double sixes rolled (of 24 rolls)', size=14) ax.set_ylabel('Relative frequency', size=14) plt.hist(deMere_array, bins=6, color="blue", edgecolor='black', density=True, alpha=0.3) plt.savefig("de_Meres_game.png") plt.clf() ################################################## # The likelihood of getting r heads (or tails) from n flips of a fair coin: ################################################## from math import factorial # Create the coefficients for binomial distribution (a.k.a. Pascal's Triangle) def nCr(n,r): """Generate binomial coefficients. Note that, for high n and moderate r, to reduce overflow, calculation of factorials should make use of Stirling's Approximation. This exercise is left for the reader.""" return factorial(n)/(factorial(r)*factorial(n-r)) def P(r, n=10): """Calculates the probability of obtaining exactly r heads from n flips of a fair coin.""" return nCr(n, r)*pow(0.5, n) # Calculate probability of eight or more of ten: print("Probability of eight or more heads from ten coin flips: " + str(P(8) + P(9) + P(10))) # 5.5% # What is the 5% threshold for 100 such coin flips? P_x = [P(x, 100) for x in range(101)] P_x_cum = np.cumsum(P_x) p_thresh = min(np.where(P_x_cum >= 0.95)[0]) # 58 # At 58 or more heads, you should be very suspicious you are being taken for a ride! # What is the 5% threshold for N such coin flips? def flipNThresh(N): """Returns the 5% threshold for N flips of a fair coin. Note that this is just the above code slightly cleaned up!""" P_x = [P(x, N) for x in range(N+1)] P_x_cum = np.cumsum(P_x) return min(np.where(P_x_cum >= 0.95)[0]) # MIT License # # Copyright (c) 2018, Ra Inta # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. ################################################## # End of games_statistics.py ##################################################

[1]As was the invention of the sandwich— butter we’ll save that for another time…

[2]Confusingly, not a knight at all. But that’s what his friends called him.

[3]Kahneman, D., & Tversky, A.: “Choices, values, and frames,” *American Psychologist*, **39**(4):341-350 (1984)

[4]Although other thresholds are used, particularly 1% and occasionally 10%, 5% is the most common in statistical publications.

Accelebrate offers Python training onsite and online.

## By Cliftonguest August 22, 2018 - 2:08 pm

Ra Inta, thanks so much for the post.Really thank you! Great.

## By Ra Inta September 17, 2018 - 10:34 pm

No problem Clifton! I am glad you enjoyed it!

Ra