• Tidak ada hasil yang ditemukan

Probability and Statistics for Computer Science

N/A
N/A
Nguyễn Gia Hào

Academic year: 2023

Membagikan "Probability and Statistics for Computer Science"

Copied!
374
0
0

Teks penuh

In my experience, many learned some or all of this material without realizing how useful it was and then forgot about it. In my experience, computer science students find simple Markov chains natural (although they may find the notation annoying) and will suggest simulating a chain before the instructor does.

Notation and Conventions

Fairness: Each face of a fair coin or a head has the same probability of landing heads up on a flip or spin. If the wheel is properly balanced, the ball has the same probability of landing in each slot.

Describing Datasets

First Tools for Looking at Data

Datasets

Throughout the book we will see many datasets downloaded from various internet sources because people are so generous in publishing interesting datasets on the internet. In the next chapter, we'll look at two-dimensional data, and in Chapter 10, we'll look at high-dimensional data.

What’s Happening? Plotting Data

  • Bar Charts
  • Histograms
  • How to Make Histograms
  • Conditional Histograms

In this case, the height of the box is determined by the number of data items in the box. Each entry represents the number of data items that lie in that interval.

Table 1.2 Chase and Dunner, in a study described in the text, collected data on what students thought made other students popular
Table 1.2 Chase and Dunner, in a study described in the text, collected data on what students thought made other students popular

Summarizing 1D Data

  • The Mean
  • Standard Deviation
  • Computing Mean and Standard Deviation Online
  • Variance
  • The Median
  • Interquartile Range
  • Using Summaries Sensibly

Similarly, after viewing the elements, you will have an estimate of the standard deviation based on those elements. The properties of the variance derive from the fact that it is the square of the standard deviation.

Fig. 1.3 On top, a histogram of body temperatures, from the dataset published at http://www2.stetson.edu/~jrasp/data.htm
Fig. 1.3 On top, a histogram of body temperatures, from the dataset published at http://www2.stetson.edu/~jrasp/data.htm

Plots and Summaries

  • Some Properties of Histograms
  • Standard Coordinates and Normal Data
  • Box Plots

This means that the right tail of the histogram is longer, so the histogram is skewed to the right. Recall the definition of the median (form an ordered list of the data points, and find the point halfway along the list).

Fig. 1.5 On the top, an example of a symmetric histogram, showing its tails (relatively uncommon values that are significantly larger or smaller than the peak or mode)
Fig. 1.5 On the top, an example of a symmetric histogram, showing its tails (relatively uncommon values that are significantly larger or smaller than the peak or mode)

Whose is Bigger? Investigating Australian Pizzas

There is a vertical box whose height corresponds to the interquartile range of the data (the width is just to make the figure easy to interpret). One possible explanation is that Eagleboys has tighter control over the size of the final pizza.

Fig. 1.8 A box plot showing the box, the median, the whiskers and two outliers. Notice that we can compare the two datasets rather easily; the next section explains the comparison
Fig. 1.8 A box plot showing the box, the median, the whiskers and two outliers. Notice that we can compare the two datasets rather easily; the next section explains the comparison

You Should

  • Remember These Definitions
  • Remember These Terms
  • Remember These Facts
  • Be Able to

Yet another possibility is that Dominos controls portions by mass of dough (so thin crust diameters tend to be larger), but Eagleboys controls by crust diameter. The fact that Dominos and EagleBoys seem to follow different strategies with success suggests that more than one strategy may work.

Problems

Programming Exercises

Use conditional histogram plots to investigate whether students from small families drink more alcohol on the weekend than those from large families. Use box plots to investigate whether gender, education, or marital status have any effect on the amount of debt (again, use X1 for debt).

Looking at Relationships

Plotting 2D Data

  • Categorical Data, Counts, and Charts

Then I made the bar chart on the left, which shows the number of children of each gender, selecting each target. The height of the bar is given by the number of elements in the type and the bar is divided into sections corresponding to the number of elements of that subtype.

Fig. 2.1 I sorted the children in the Chase and Dunner study into six categories (two genders by three goals), and counted the number of children that fell into each cell
Fig. 2.1 I sorted the children in the Chase and Dunner study into six categories (two genders by three goals), and counted the number of children that fell into each cell

Gender by goals

Goals by gender, relative frequencies

Goals by gender

Gender by goals, relative frequencies

  • Series
  • Scatter Plots for Spatial Data
  • Exposing Relationships with Scatter Plots
  • Correlation
    • The Correlation Coefficient
    • Using Correlation to Predict
    • Confusion Caused by Correlation
  • Sterile Males in Wild Horse Herds
  • You Should
    • Remember These Definitions
    • Remember These Terms
    • Be Able to

Figure 2.10 shows a scatter plot of the shipping data, where I have plotted the number of skins against price. Fortunately, scaling or translating data does not change the value of the correlation coefficient (although it may change the sign if a scale is negative).

Fig. 2.3 A heat map of the Chase and Dunner data. The color of each cell corresponds to the count of the number of elements of that type
Fig. 2.3 A heat map of the Chase and Dunner data. The color of each cell corresponds to the count of the number of elements of that type

ProbabilityProbability

Experiments, Outcomes and Probability

  • Outcomes and Probability

Worked example 3.2 (Find the queen, twice) We play find the queen twice, replacing the card we have chosen. This number indicates the relative frequency of the result of interest when an experiment is repeated a very large number of times. The probability is one because each experiment must have one of the outcomes in the sample space.

Events

  • Computing Event Probabilities by Counting Outcomes
  • The Probability of Events
  • Computing Probabilities by Reasoning About Sets

The number of outcomes in the event comes from noting that each outcome in the event is an order of the cards, with the first seven cards being 2-8 of hearts, in that order. The number of outcomes in the event is the number of 30-day lists, all different. The total number of lists is the number of lists of 29 days per year.

Fig. 3.1 If you think of the probability of an event as measuring its “size”, many of the rules are quite straightforward to remember
Fig. 3.1 If you think of the probability of an event as measuring its “size”, many of the rules are quite straightforward to remember

Independence

  • Example: Airline Overbooking

This coin is flipped seven times, and we are interested in the probability that there are sevenTs. Solution Now we flip the coin eight times and are interested in the probability of getting more than sixTs. Solution Now we flip the coin eight times and are interested in the probability of getting exactly six T's.

Fig. 3.2 On the left, A and B are independent. A spans 1=4 of , and A \ B spans 1=4 of B
Fig. 3.2 On the left, A and B are independent. A spans 1=4 of , and A \ B spans 1=4 of B

Conditional Probability

  • Evaluating Conditional Probabilities
  • Detecting Rare Events Is Hard
  • Conditional Probability and Various Forms of Independence Two events are independent ifTwo events are independent if
  • Warning Example: The Prosecutor’s Fallacy
  • Warning Example: The Monty Hall Problem

What is the conditional probability of getting a royal flush, conditioned on the event that this card is the spade nine. If the test says you do have the disease, what is the probability that you actually have the disease. This means that knowing that A has occurred tells you nothing about B—the probability that B will occur is the same whether you know that it has occurred or not.

Extra Worked Examples .1 Outcomes and Probability.1Outcomes and Probability

  • Events
  • Independence
  • Conditional Probability

Worked example 3.45 (Birthdays in a row) We randomly stop three people and ask them the day of the week they were born. Worked example 3.53 (Which disease do you have?) Disease A occurs with probability 0.1 (ie, it is present in 20% of the population) and disease B occurs with probability 0.2. Worked Example 3.54 (Fraud or Psychic Powers?) You want to investigate the powers of a supposed psychic.

You Should

  • Remember These Definitions
  • Remember These Terms
  • Remember and Use These Facts
  • Remember These Points
  • Be Able to

What is the probability that the ball will end up in a red spot with an even number? What is the conditional probability that the card you draw is a red king, conditional on the card drawn being a king. What is the conditional probability that the card you draw is a red king, conditional on the removed card being a red king.

Random Variables and Expectations

Random Variables

  • Joint and Conditional Probability for Random Variables

A function that takes a discrete random variable into a set of numbers is also a discrete random variable. Worked example 4.2 (coin bets) One way to obtain a random variable is to think about the prize for a bet. We will write P.X/ to denote the probability distribution of the random variable, and P.x/or P.XDx/ to denote the probability that the random variable assumes a particular value.

Bayes’ Rule)

  • Just a Little Continuous Probability
  • Expectations and Expected Values
    • Expected Values
    • Mean, Variance and Covariance
    • Expectations and Statistics
  • The Weak Law of Large Numbers
    • IID Samples
    • Two Inequalities
    • Proving the Inequalities
    • The Weak Law of Large Numbers
  • Using the Weak Law of Large Numbers
    • Should You Accept a Bet?
    • Odds, Expectations and Bookmaking: A Cultural Diversion
    • Ending a Game Early
    • Making a Decision with Decision Trees and Expectations
    • Utility
  • You Should
    • Remember These Definitions
    • Remember These Terms
    • Use and Remember These Facts
    • Remember These Points 4.5.5 Be Able to4.5.5Be Able to

This is not the actual revenue from a single game (which would be different, depending on what the coin did). NowXIs a random variable (there are IID samples, and for a different set of samples you will get a different, random XN). Otherwise, Xgets the value of the four-sided die and gets the value of the six-sided die. Zalways gets the value of the dice amount. a) What is P.X/, the probability distribution of this random variable.

Table 4.1 A table of the joint probability distribution of S (vertical axis; scale 2; : : : ; 12) and D (horizontal axis; scale
Table 4.1 A table of the joint probability distribution of S (vertical axis; scale 2; : : : ; 12) and D (horizontal axis; scale

Useful Probability Distributions

Discrete Distributions

  • The Discrete Uniform Distribution
  • Bernoulli Random Variables
  • The Geometric Distribution
  • The Binomial Probability Distribution
  • Multinomial Probabilities
  • The Poisson Distribution

Note that the number of heads in N tosses can be obtained by adding the number of heads in each toss. For example, you can take the length of a road, divide it into equal intervals, and then count the number of animals killed on the road in each interval. A Poisson point process with intensity is a set of random points with the property that the number of points in an interval of length is a Poisson random variable with parameters.

Continuous Distributions

  • The Continuous Uniform Distribution
  • The Beta Distribution
  • The Gamma Distribution
  • The Exponential Distribution

Figure 5.2 shows plots of the probability density function of the Gamma distribution for a variety of different values ​​of ˛ and ˇ. We assume that failures form a Poisson process over time; then the time to the next failure is exponentially distributed. The time between calls will be exponentially distributed with parameter, and the expected time until the next call is1=(in hours).

Fig. 5.1 Probability density functions for the Beta distribution with a variety of different choices of ˛ and ˇ
Fig. 5.1 Probability density functions for the Beta distribution with a variety of different choices of ˛ and ˇ

The Normal Distribution

  • The Standard Normal Distribution
  • The Normal Distribution
  • Properties of the Normal Distribution

From this (and tables for the error function, or your favorite math package) we get that, for a standard normal random variable. About 95% of the time, a normal random variable takes a value within two standard deviations of the mean. About 99% of the time, a normal random variable takes a value within three standard deviations of the mean.

Approximating Binomials with Large N

  • Large N
  • Getting Normal
  • Using a Normal Approximation to the Binomial Distribution I have proven an extremely useful fact, which I shall now put in a box.I have proven an extremely useful fact, which I shall now put in a box

The main problem with Figure 5.4 (and with the argument above) is that the mean and standard deviation of the binomial distribution tend to infinity as the number of coin tosses tends to infinity. Then, for sufficiently large N, the probability distribution P.x/ can be approximated by a probability density function of 1. We know, for example, that a standard normal random variable has a value between 1 and 1 68% of the time.

Fig. 5.5 Plots of the distribution for the normalized variable x, with P.x/ given in the text, obtained from the binomial distribution with p D q D 0:5 for different values of N
Fig. 5.5 Plots of the distribution for the normalized variable x, with P.x/ given in the text, obtained from the binomial distribution with p D q D 0:5 for different values of N

You Should

  • Remember These Definitions
  • Remember These Terms
  • Remember These Points

Toss a coinN once and countN. Show that the probability distribution for is the same as the probability distribution for hN. What is the probability that the plane will travel with one or more empty seats. 5.20 Show that the multinomial distribution. Use this and sample matching to show that the variance of the Poisson distribution is with an intensity parameter.

Inference

Samples and Populations

The Sample Mean

  • The Sample Mean Is an Estimate of the Population Mean
  • The Variance of the Sample Mean
  • When The Urn Model Works
  • Distributions Are Like Populations

Under our sampling model, the expected value of the sample mean is the population mean. It is random because different samples from the population will have different values ​​of the sample mean. Knowing the variance of X.N/ will tell us how accurate our estimate of the population mean is.

Confidence Intervals

  • Constructing Confidence Intervals
  • Estimating the Variance of the Sample Mean Recall the variance of the sample mean isRecall the variance of the sample mean is
  • The Probability Distribution of the Sample Mean
  • Confidence Intervals for Population Means
  • Standard Error Estimates from Simulation

Our estimate of the unknown number popmean.fXg/is the mean of the sample we have, which we write mean.fxg/. Assume the sample is large enough so that mean.fxg/ popmean.fXg/=stderr.fxg/ is a standard normal random variable. Suppose we want to estimate the standard error of a statisticS.fxg/, which is a function of our datasetfxgofNdata items.

Fig. 6.1 A simple demonstration that sample means behave as described, by computing sample means from the heights dataset
Fig. 6.1 A simple demonstration that sample means behave as described, by computing sample means from the heights dataset

You Should

  • Remember These Definitions
  • Remember These Terms
  • Remember These Facts
  • Use These Procedures

Use the reasoning and data above to construct a 99% confidence interval for the likelihood of a male birth. Use each sample to calculate a centered 90% confidence interval for the population mean, using the t-distribution. Use each sample to make a bootstrap estimate of a centered 90% confidence interval for the population median.

The Significance of Evidence

Significance

  • Evaluating Significance
  • P-Values

We estimate how odd the sample would have to be to give the value we actually see, if the hypothesis is true. You should think of a fraction that represents the fraction of samples that would give an absolute value greater than the observed one if the hypothesis were true. Sometimes, the p-value is even smaller, and this can be interpreted as very strong evidence that the null hypothesis is wrong.

Comparing the Mean of Two Populations

  • Assuming Known Population Standard Deviations
  • Assuming Same, Unknown Population Standard Deviation
  • Assuming Different, Unknown Population Standard Deviation

Ifpopmean.fXg/ D popmean.fYg/, then we have thatmean.fxg/ mean.fyg/ is the value of a random variable whose mean is 0 and whose variance is. If popmean.fXg/ D popmean.fYg/, then we have that mean.fxg/ mean.fyg/ is the value of a random variable whose mean is 0 and whose variance is . Calculate the p-value using the recipe in Procedure 7.2; the number of degrees of freedom is stdunbiased.fxg/2=kxCstdunbiased.fyg/2=ky.

Other Useful Tests of Significance

  • F-Tests and Standard Deviations

This means that the distribution depends on the number of degrees of freedom for each data set (ie Nx 1 and Ny 1). Once we have the best estimate of the intensity, we still want to know if the model is consistent with the data. If we estimate parameters from the data, then the number of degrees of freedom will be k p 1 (because there are numbers, they must carry peak parameter values, and they must be added to 1).

P-Value Hacking and Other Dangerous Behavior

You Should

  • Remember These Definitions

Weigh 20 fatty Zucker rats and get a mean weight of 1000 grams with a standard deviation of 100 grams. Weigh 35 fatty Zucker rats and get a mean weight of 1000 grams with a standard deviation of 100 grams. You will step by step evaluate the evidence against the claim that a fat Zucker rat weighs exactly twice as much as a thin Zucker rat.

A Simple Experiment: The Effect of a Treatment

  • Randomized Balanced Experiments
  • Decomposing Error in Predictions
  • Estimating the Noise Variance
  • The ANOVA Table

Finding out whether the groups are different requires careful consideration, as the subjects will differ for various irrelevant reasons (body weight, sensitivity to medication, and so on). are due to the treatment or to the irrelevant reasons. This means that it is helpful if there are the same number of subjects in each group - we say the experiment is balanced - so that the error due to chance effects is the same in each group. This means that MSB would be greater than it would be if the treatment had no effect.

Evaluating Whether a Treatment Has Significant Effects with a One-Way ANOVA for Balanced Experiments)

  • Unbalanced Experiments
  • Significant Differences
  • Two Factor Experiments
    • Decomposing the Error

This is the result of the weighting terms in the expressions for the mean squared error. With luck, the treatment effect will be so strong that significance testing is a formality. As in the case of a single factor, we assume that noise is independent of treatment level.

Fig. 8.1 On the left, a boxplot of concentration of Aldrin at three different depths (see Worked example 8.1)
Fig. 8.1 On the left, a boxplot of concentration of Aldrin at three different depths (see Worked example 8.1)

Gambar

Fig. 1.2 On the left, a histogram of net worths from the dataset described in the text and shown in Table 1.1
Fig. 1.5 On the top, an example of a symmetric histogram, showing its tails (relatively uncommon values that are significantly larger or smaller than the peak or mode)
Fig. 1.7 Data is standard normal data when its histogram takes a stylized, bell-shaped form, plotted above
Fig. 1.8 A box plot showing the box, the median, the whiskers and two outliers. Notice that we can compare the two datasets rather easily; the next section explains the comparison
+7

Referensi

Dokumen terkait

Assuming the incidence of disease to be 1% for the population, find the expected number of tests required for each batch.. Let the distribution of marks on a class test have mean

This process is called estimation , and the statistic we used (the sample mean) is called an estimator.. Using the sample mean to estimate µ is so obvious that it is hard to imagine

A large number of practical situations can be described by the repeated per- formance of a random experiment of the following basic nature: a sequence of trials is performed so that

In the case of tossing a coin three times, the variable X, representing the number of heads, assumes the value 2 with probability 3/8, since 3 of the 8 equally likely sample

For example, if the test is two tailed and α is set at the 0.05 level of significance and the test statistic involves, say, the standard normal distribution, then a z-value is

The confidence intervals for each population mean being compared and the confidence intervals for a difference between population means of the pairwises should be used to summarize the

country school students have IQ scores the same as the general population, how likely is a sample mean to be 3 points or more away from the population mean, in either direction?’ A

Bank Sumut Syariah’s standard deviation of 3.80 shows a relatively small data deviation because the value is smaller than the mean value of 12.12 while Bank BSI’s standard deviation of