In my experience, many learned some or all of this material without realizing how useful it was and then forgot about it. In my experience, computer science students find simple Markov chains natural (although they may find the notation annoying) and will suggest simulating a chain before the instructor does.
Notation and Conventions
Fairness: Each face of a fair coin or a head has the same probability of landing heads up on a flip or spin. If the wheel is properly balanced, the ball has the same probability of landing in each slot.
Describing Datasets
First Tools for Looking at Data
Datasets
Throughout the book we will see many datasets downloaded from various internet sources because people are so generous in publishing interesting datasets on the internet. In the next chapter, we'll look at two-dimensional data, and in Chapter 10, we'll look at high-dimensional data.
What’s Happening? Plotting Data
- Bar Charts
- Histograms
- How to Make Histograms
- Conditional Histograms
In this case, the height of the box is determined by the number of data items in the box. Each entry represents the number of data items that lie in that interval.
Summarizing 1D Data
- The Mean
- Standard Deviation
- Computing Mean and Standard Deviation Online
- Variance
- The Median
- Interquartile Range
- Using Summaries Sensibly
Similarly, after viewing the elements, you will have an estimate of the standard deviation based on those elements. The properties of the variance derive from the fact that it is the square of the standard deviation.
Plots and Summaries
- Some Properties of Histograms
- Standard Coordinates and Normal Data
- Box Plots
This means that the right tail of the histogram is longer, so the histogram is skewed to the right. Recall the definition of the median (form an ordered list of the data points, and find the point halfway along the list).
Whose is Bigger? Investigating Australian Pizzas
There is a vertical box whose height corresponds to the interquartile range of the data (the width is just to make the figure easy to interpret). One possible explanation is that Eagleboys has tighter control over the size of the final pizza.
You Should
- Remember These Definitions
- Remember These Terms
- Remember These Facts
- Be Able to
Yet another possibility is that Dominos controls portions by mass of dough (so thin crust diameters tend to be larger), but Eagleboys controls by crust diameter. The fact that Dominos and EagleBoys seem to follow different strategies with success suggests that more than one strategy may work.
Problems
Programming Exercises
Use conditional histogram plots to investigate whether students from small families drink more alcohol on the weekend than those from large families. Use box plots to investigate whether gender, education, or marital status have any effect on the amount of debt (again, use X1 for debt).
Looking at Relationships
Plotting 2D Data
- Categorical Data, Counts, and Charts
Then I made the bar chart on the left, which shows the number of children of each gender, selecting each target. The height of the bar is given by the number of elements in the type and the bar is divided into sections corresponding to the number of elements of that subtype.
Gender by goals
Goals by gender, relative frequencies
Goals by gender
Gender by goals, relative frequencies
- Series
- Scatter Plots for Spatial Data
- Exposing Relationships with Scatter Plots
- Correlation
- The Correlation Coefficient
- Using Correlation to Predict
- Confusion Caused by Correlation
- Sterile Males in Wild Horse Herds
- You Should
- Remember These Definitions
- Remember These Terms
- Be Able to
Figure 2.10 shows a scatter plot of the shipping data, where I have plotted the number of skins against price. Fortunately, scaling or translating data does not change the value of the correlation coefficient (although it may change the sign if a scale is negative).
ProbabilityProbability
Experiments, Outcomes and Probability
- Outcomes and Probability
Worked example 3.2 (Find the queen, twice) We play find the queen twice, replacing the card we have chosen. This number indicates the relative frequency of the result of interest when an experiment is repeated a very large number of times. The probability is one because each experiment must have one of the outcomes in the sample space.
Events
- Computing Event Probabilities by Counting Outcomes
- The Probability of Events
- Computing Probabilities by Reasoning About Sets
The number of outcomes in the event comes from noting that each outcome in the event is an order of the cards, with the first seven cards being 2-8 of hearts, in that order. The number of outcomes in the event is the number of 30-day lists, all different. The total number of lists is the number of lists of 29 days per year.
Independence
- Example: Airline Overbooking
This coin is flipped seven times, and we are interested in the probability that there are sevenTs. Solution Now we flip the coin eight times and are interested in the probability of getting more than sixTs. Solution Now we flip the coin eight times and are interested in the probability of getting exactly six T's.
Conditional Probability
- Evaluating Conditional Probabilities
- Detecting Rare Events Is Hard
- Conditional Probability and Various Forms of Independence Two events are independent ifTwo events are independent if
- Warning Example: The Prosecutor’s Fallacy
- Warning Example: The Monty Hall Problem
What is the conditional probability of getting a royal flush, conditioned on the event that this card is the spade nine. If the test says you do have the disease, what is the probability that you actually have the disease. This means that knowing that A has occurred tells you nothing about B—the probability that B will occur is the same whether you know that it has occurred or not.
Extra Worked Examples .1 Outcomes and Probability.1Outcomes and Probability
- Events
- Independence
- Conditional Probability
Worked example 3.45 (Birthdays in a row) We randomly stop three people and ask them the day of the week they were born. Worked example 3.53 (Which disease do you have?) Disease A occurs with probability 0.1 (ie, it is present in 20% of the population) and disease B occurs with probability 0.2. Worked Example 3.54 (Fraud or Psychic Powers?) You want to investigate the powers of a supposed psychic.
You Should
- Remember These Definitions
- Remember These Terms
- Remember and Use These Facts
- Remember These Points
- Be Able to
What is the probability that the ball will end up in a red spot with an even number? What is the conditional probability that the card you draw is a red king, conditional on the card drawn being a king. What is the conditional probability that the card you draw is a red king, conditional on the removed card being a red king.
Random Variables and Expectations
Random Variables
- Joint and Conditional Probability for Random Variables
A function that takes a discrete random variable into a set of numbers is also a discrete random variable. Worked example 4.2 (coin bets) One way to obtain a random variable is to think about the prize for a bet. We will write P.X/ to denote the probability distribution of the random variable, and P.x/or P.XDx/ to denote the probability that the random variable assumes a particular value.
Bayes’ Rule)
- Just a Little Continuous Probability
- Expectations and Expected Values
- Expected Values
- Mean, Variance and Covariance
- Expectations and Statistics
- The Weak Law of Large Numbers
- IID Samples
- Two Inequalities
- Proving the Inequalities
- The Weak Law of Large Numbers
- Using the Weak Law of Large Numbers
- Should You Accept a Bet?
- Odds, Expectations and Bookmaking: A Cultural Diversion
- Ending a Game Early
- Making a Decision with Decision Trees and Expectations
- Utility
- You Should
- Remember These Definitions
- Remember These Terms
- Use and Remember These Facts
- Remember These Points 4.5.5 Be Able to4.5.5Be Able to
This is not the actual revenue from a single game (which would be different, depending on what the coin did). NowXIs a random variable (there are IID samples, and for a different set of samples you will get a different, random XN). Otherwise, Xgets the value of the four-sided die and gets the value of the six-sided die. Zalways gets the value of the dice amount. a) What is P.X/, the probability distribution of this random variable.
Useful Probability Distributions
Discrete Distributions
- The Discrete Uniform Distribution
- Bernoulli Random Variables
- The Geometric Distribution
- The Binomial Probability Distribution
- Multinomial Probabilities
- The Poisson Distribution
Note that the number of heads in N tosses can be obtained by adding the number of heads in each toss. For example, you can take the length of a road, divide it into equal intervals, and then count the number of animals killed on the road in each interval. A Poisson point process with intensity is a set of random points with the property that the number of points in an interval of length is a Poisson random variable with parameters.
Continuous Distributions
- The Continuous Uniform Distribution
- The Beta Distribution
- The Gamma Distribution
- The Exponential Distribution
Figure 5.2 shows plots of the probability density function of the Gamma distribution for a variety of different values of ˛ and ˇ. We assume that failures form a Poisson process over time; then the time to the next failure is exponentially distributed. The time between calls will be exponentially distributed with parameter, and the expected time until the next call is1=(in hours).
The Normal Distribution
- The Standard Normal Distribution
- The Normal Distribution
- Properties of the Normal Distribution
From this (and tables for the error function, or your favorite math package) we get that, for a standard normal random variable. About 95% of the time, a normal random variable takes a value within two standard deviations of the mean. About 99% of the time, a normal random variable takes a value within three standard deviations of the mean.
Approximating Binomials with Large N
- Large N
- Getting Normal
- Using a Normal Approximation to the Binomial Distribution I have proven an extremely useful fact, which I shall now put in a box.I have proven an extremely useful fact, which I shall now put in a box
The main problem with Figure 5.4 (and with the argument above) is that the mean and standard deviation of the binomial distribution tend to infinity as the number of coin tosses tends to infinity. Then, for sufficiently large N, the probability distribution P.x/ can be approximated by a probability density function of 1. We know, for example, that a standard normal random variable has a value between 1 and 1 68% of the time.
You Should
- Remember These Definitions
- Remember These Terms
- Remember These Points
Toss a coinN once and countN. Show that the probability distribution for is the same as the probability distribution for hN. What is the probability that the plane will travel with one or more empty seats. 5.20 Show that the multinomial distribution. Use this and sample matching to show that the variance of the Poisson distribution is with an intensity parameter.
Inference
Samples and Populations
The Sample Mean
- The Sample Mean Is an Estimate of the Population Mean
- The Variance of the Sample Mean
- When The Urn Model Works
- Distributions Are Like Populations
Under our sampling model, the expected value of the sample mean is the population mean. It is random because different samples from the population will have different values of the sample mean. Knowing the variance of X.N/ will tell us how accurate our estimate of the population mean is.
Confidence Intervals
- Constructing Confidence Intervals
- Estimating the Variance of the Sample Mean Recall the variance of the sample mean isRecall the variance of the sample mean is
- The Probability Distribution of the Sample Mean
- Confidence Intervals for Population Means
- Standard Error Estimates from Simulation
Our estimate of the unknown number popmean.fXg/is the mean of the sample we have, which we write mean.fxg/. Assume the sample is large enough so that mean.fxg/ popmean.fXg/=stderr.fxg/ is a standard normal random variable. Suppose we want to estimate the standard error of a statisticS.fxg/, which is a function of our datasetfxgofNdata items.
You Should
- Remember These Definitions
- Remember These Terms
- Remember These Facts
- Use These Procedures
Use the reasoning and data above to construct a 99% confidence interval for the likelihood of a male birth. Use each sample to calculate a centered 90% confidence interval for the population mean, using the t-distribution. Use each sample to make a bootstrap estimate of a centered 90% confidence interval for the population median.
The Significance of Evidence
Significance
- Evaluating Significance
- P-Values
We estimate how odd the sample would have to be to give the value we actually see, if the hypothesis is true. You should think of a fraction that represents the fraction of samples that would give an absolute value greater than the observed one if the hypothesis were true. Sometimes, the p-value is even smaller, and this can be interpreted as very strong evidence that the null hypothesis is wrong.
Comparing the Mean of Two Populations
- Assuming Known Population Standard Deviations
- Assuming Same, Unknown Population Standard Deviation
- Assuming Different, Unknown Population Standard Deviation
Ifpopmean.fXg/ D popmean.fYg/, then we have thatmean.fxg/ mean.fyg/ is the value of a random variable whose mean is 0 and whose variance is. If popmean.fXg/ D popmean.fYg/, then we have that mean.fxg/ mean.fyg/ is the value of a random variable whose mean is 0 and whose variance is . Calculate the p-value using the recipe in Procedure 7.2; the number of degrees of freedom is stdunbiased.fxg/2=kxCstdunbiased.fyg/2=ky.
Other Useful Tests of Significance
- F-Tests and Standard Deviations
This means that the distribution depends on the number of degrees of freedom for each data set (ie Nx 1 and Ny 1). Once we have the best estimate of the intensity, we still want to know if the model is consistent with the data. If we estimate parameters from the data, then the number of degrees of freedom will be k p 1 (because there are numbers, they must carry peak parameter values, and they must be added to 1).
P-Value Hacking and Other Dangerous Behavior
You Should
- Remember These Definitions
Weigh 20 fatty Zucker rats and get a mean weight of 1000 grams with a standard deviation of 100 grams. Weigh 35 fatty Zucker rats and get a mean weight of 1000 grams with a standard deviation of 100 grams. You will step by step evaluate the evidence against the claim that a fat Zucker rat weighs exactly twice as much as a thin Zucker rat.
A Simple Experiment: The Effect of a Treatment
- Randomized Balanced Experiments
- Decomposing Error in Predictions
- Estimating the Noise Variance
- The ANOVA Table
Finding out whether the groups are different requires careful consideration, as the subjects will differ for various irrelevant reasons (body weight, sensitivity to medication, and so on). are due to the treatment or to the irrelevant reasons. This means that it is helpful if there are the same number of subjects in each group - we say the experiment is balanced - so that the error due to chance effects is the same in each group. This means that MSB would be greater than it would be if the treatment had no effect.
Evaluating Whether a Treatment Has Significant Effects with a One-Way ANOVA for Balanced Experiments)
- Unbalanced Experiments
- Significant Differences
- Two Factor Experiments
- Decomposing the Error
This is the result of the weighting terms in the expressions for the mean squared error. With luck, the treatment effect will be so strong that significance testing is a formality. As in the case of a single factor, we assume that noise is independent of treatment level.