3.1 The binomial distribution and the standard normal distribution
3.1.2 Methods of inference based on the ‘success’ probability
Collet (1991) has extensively discussed inference about the success probability statistic. Note that statistical inference is concerned with three issues. First, the precision of an estimate, i.e., how precise or how reliable is the sample statistic observed so as to predict the population parameter? Second, how confident are we about the estimate? Third, what is the significance of the difference between the observed statistic and a preconceived parameter or measure of association between the variables?
In answering the first question, we show how the parameter estimate (denoted as ̂) of the success probability, p, is obtained. In principal it is conceived that as the number of observations, n, become really large, the estimate of the success probability, ̂, tends to become more reliable. This reliability is then ascertained if we know the standard error (or the standard deviation) of the estimate. The binomial distribution of the random variable Y, denoted as B(n,p), suggests that the variance of an observation y is np(1-p). It follows that the variance of the estimate of p, ̂, is
( ̂ ) ( )
which can be estimated as
̂ ( ̂) ̂( ̂) ⁄
Obtaining the square root of this expression leads to the standard error (denoted as s.e.) of ̂
( ̂) √( ̂( ̂) )
If the standard error of the estimate ̂ relative to the sample size n is large, it can be concluded that the estimate is less precise. Otherwise if the value of s.e. is small, the estimate is good and statistically more reliable.
Secondly, we derive the confidence interval regarding the true success probability p. That is, the range of values around the estimate where we expect the “true” population probability of success, p, to be located with a given level of confidence or certainty (StatSoft, 2007). The set level of confidence is also described by Collet (1991) as “the probability of including p” and usually given as 0.90, 0.95 or 0.99. Setting the level of confidence as , where is a small positive value gives a confidence interval of 100(1-)% for p. The value of is usually 0.10, 0.05 or 0.01.This means if the sample selection were to be repeated a number of times, there would be a 100(1-)% chance of the interval containing the true probability of success.
Being an interval of values, it has a lower limit (denoted pL) and an upper limit (dented pU), which are the smallest and largest binomial probability of success, respectively. Either probability, which equals to the observed proportion, ⁄ has a probability that is at least (Collet, 1991; p. 23). These probabilities are given by the expressions
∑ ( ) ( ) (3.2)
and
∑ ( ) ( ) (3.3)
which compute the lower and upper limits of the probability that a binomial random variable with parameters ( ) and ( ) respectively take the value of y or more and the value of 0 or more. The values of and are readily derived from Tables of Binomial Distribution
for given values , y and n , which are usually appended to many statistics text books (Collet, 1991; p. 23).
Note that as the confidence interval ( ) depends on the sample size, n, and the variance (the spread of observations about their mean), the larger the sample size, the more reliable is its calculated probability value and vice versa. Note also that the calculation of confidence interval is based on the assumption that the variable is normally distributed in the population.
Hence unless the sample size is large enough the estimate may not be valid if the assumption of normality is not met.
The normal approximation to the binomial distribution, constructed based on the percentage points of the standard normal distribution8, provides the real answer to deriving the confidence interval for the ‘true’ success probability. Explained pictorially by Figure 3.1, if the random variable Z has a standard normal distribution, the upper (100/2)% point of the distribution is , expressed as ( ). This probability density area is the shaded area to the right of . By symmetry, the lower (100/2)% point of the binomial distribution is equal to such that ( ). This probability distribution of the lower end of the random variable Z, is the shaded area to the left of the value in Figure 3.1. It follows that ( ) . Statistical tables featuring the percentage points of a standard normal distribution, which are commonly found at the back of a number of statistics textbooks, can provide the values of for a given level of . For example, we can obtain the values of for =0.1, 0.05 and 0.01which are equal to 1.645, 1.960 and 2.576, respectively (Collet, 1991).
8 The standard normal distribution is a normal distribution with mean 0 (=0) and standard deviation 1 (2=1).
Figure 3.1: The upper and lower /2 points of the standard normal distribution (Collet, 1991).
Now, re-expressing equation 3.1 by dividing the numerator and the denominator by n (i.e.
multiplying by 1/n) and replacing y, the observed value of Y, with ̂ we get
√[ ( ) ] ̂ (3.4) which has an approximate standard normal distribution of the area under the curve (Figure 3.1) between and , or that
{ ̂
√[ ( ) ] } (3.5) Replacing p(1-p) with ̂( ̂), that is, further approximating the value of the success probability, leads to equation 3.5 becoming
{ √ ̂[( ̂) ] ̂ } (3.6)
Since the expression √[( ̂( ̂) ] is the standard error s.e. of ̂, equation 3.6 becomes { ( ̂) ̂ } (3.7)
It therefore follows that
̂ ( ̂) ̂ ( ̂) , (3.8) thus giving the required lower limit and upper limit of the confidence interval of the ‘true’
success probability, p. In interpreting the result, we say that if p is normally distributed with
mean zero and unit variance, i.e. N(0,1), we can be (100)% confident that its true value lies between ̂ ( ̂) and ̂ ( ̂).
The third measure of inference about the estimate is to test the hypothesis that the value of p is equal to some pre-specified value, say, p0 and to estimate the significance of any difference between the two values. In other words, we need to provide statistical evidence to suggest that either the ‘true’ (population) success probability differ significantly, or there is no adequate basis to conclude there is significant difference. That is, if there is any difference it might have occurred by chance only. A ‘conservative’ hypothesis commonly known as the
‘null hypothesis’ (denoted H0), would be stated as H0: p=p0. A ‘complementary’ hypothesis, widely termed the ‘alternative hypothesis’, would be expressed to state there is a difference between the ‘true’ success probability and the theoretical probability, p0. This is given as H1:p<p0. Rejection of the null hypothesis is suggested if the observed probability of success is too small. The null hypothesis formulated in this way means a one-tailed test which means the significance level,, is not to be divided by 2 as p is distributed between 0 and y as in
∑ ( )
( )
The relative size of this probability determines the conclusion regarding rejecting or not rejecting the null hypothesis H0. A reasonably large probability indicates a high number of successes relative to the number of trials n and that leads us not to have enough grounds to reject the null hypothesis that . In contrast, a relatively small probability indicates small number of successes, an outcome which favours the alternative hypothesis H1 which is that and leading to rejection of the null hypothesis H0. In the second case, the probability of y successes, P(Y=y), is smaller than , the significance level of the hypothesis test and we conclude that the difference between ̂ and its hypothesised value, p0, is significant at the 100% level of significance (Collet, 1991).
Quite often the output of statistics software presents the actual probability of success otherwise commonly known as the p-value, to warrant rejection or ‘acceptance’ of the null hypothesis. Collet (1991) describes the p-value as “a summary measure of the weight of evidence against H0”. The p-value is the probability of error involved in accepting the observed result as valid or representative of the population (StatSoft, 2007). A ‘rule of
thumb’ for rejecting H0 usually provides for rejection in situations when the p-value is given as in Table 3.1 below.
Table 3.1 The interpretation of the p-value and conclusion of significance regarding H0
p-value Range Interpretation of level of evidence against H0
Significance of p-value Decision
1 0.1 There is no evidence against H0 Not significant Do no reject H0
2 0.05 There is slight evidence against H0 Borderline significant Reject H0
3 0.01 There is moderate evidence Somewhat significant Reject H0
4 0.001 There is strong evidence Highly significant Reject H0
5 There is overwhelming evidence Very highly significant Reject H0
The judgements given above are those of a one-tailed or one-sided hypothesis test as we know that the alternative hypothesis H1 states that . Otherwise, if H1 had stated that , testing would have been based on a two-sided test, as we could aim at testing and . Collet (1991, p.27), states that using one-tailed test is “somewhat controversial” and hence recommends using a two-tailed test.