• Tidak ada hasil yang ditemukan

Gaussian Distribution

Dalam dokumen Trace Element Analysis of Food and Diet (Halaman 30-36)

The curve given in Figure 2.1 is called, Normal, bell or Gaussian curve. The vertical axis shows the relative frequency of occurrence of a measurement xior its error xi. In theory, if the number of measurements are infinite or very large, then the mean will be population mean,µ.But, in practice, the number of measurements will be limited, the mean will be sample mean,. The important part is the area under the curve, and the information that can be obtained about the population mean from the sample mean.

Figure 2.1 Gaussian distribution

This function, which is the fundamental distribution in statistics and theory of errors, leads to the following probability density equation:

y exp

(xiµ)2/2σ2

(2.6)

where yis the frequency of a given xi value,σ the standard deviation,µthe true value and (xiµ) the deviation from the true value or error. If the number of measurements is very large, in practice more than 20, we may assume the average is equal to true value µ, provided that there is no systematic error. If the number of measurements is less than 20,sis used instead of σ.

The standard deviation of measurements illustrates how closely all measurements would cluster about the mean. The normal distribution curve gives information about the normal random error. Also the curve has a maximum value at , it is symmetri- cal with respect to value, any change in the value of changes the normal curve along the x-axis but the shape of the curve is not affected. Finally, a modification of σwill either widen or narrow the peak but will be left unchanged. The equation can be modified by defining a new term, the zfactor,

z (2.7)

The quantity zgives the deviation from the mean in units of standard deviation. The equation of distribution will then be

y exp(z2/2) (2.8)

The ideal curve of Equation (2.8), represented in Figure 2.1, is based upon an infi- nite number of observations with positive and negative deviations equally probable.

The measures of variability include certain constant fractions of total area of the normal curve. When the mean is taken as the centre,1σcovers 68.26%,2σ cov- ers 95.46% and 3σ covers 99.74% of total area. The middle 50% corresponds to 0.6745σ.The first interpretation of the results is that whenever a sample is chosen from a population, the chances are 68.26 out of 100 that its sample mean is within 1σ of the population mean.

2.4.1 Log-Normal Distribution

Not all quantities in the world have normal distributions. We find, for example, that concentrations of trace species in food and diet, in the atmosphere or other media are more often log-normally distributed than normal. Whenever the fluctuations of a quantity are comparable in magnitude to the mean value, there is a good chance that the distribution will be log-normal. In that situation, the normal distribution will pre- dict significant probabilities for negative values, which make no physical sense. By contrast, negative values do not arise in log-normal distributions.

As shown in Figures 2.2a and b, a log-normal distribution simply means that, if one plots the probability vs. the logarithm of the quantity, the resulting distribution

1 σ兹2苶π苶

xi µ σ 1

(σ兹2苶苶π)

14 Chapter 2

is normal. In these cases, the calculation will involve the normal distribution func- tion. Then Equation (2.8) will be in the form

y exp

log xilog

2/2σ2

(2.9)

In general, if average and median are different, it is possible that distribution fits to log-normal distribution better than normal distribution, and geometric mean agrees better with median than mean.

For log-normal distributions, it is appropriate to calculate the geometric mean,xg:

xgN兿兿苶xi (2.10)

where indicates that one takes the product of the Nvalues of xi. Likewise the geo- metric standard deviation is given by

σg10k (2.11)

where

k

冤 冥

1/2 (2.12)

Note that σgis multiplicative: 66% of the points should fall within the range g/σg and gσ. Since, as seen in Figure 2.2b, the range below gis less than the range above g, one should report the result with different negative and positive uncertain- ties,e.g.as gσ

σ rather than gσ.

2.4.2 Standard Deviation

The standard deviation sor σis a good measure of deviation from the mean. It dif- fers from mean deviation (see Section 2.3.5) by squaring the deviations from the mean instead of taking the absolute values as in mean deviation. The standard devi- ation of a population N, with value xiand true value µis

σ

冪莦

(xiNµ)2 (2.13)

log2xi Nlog2g

N 1 1

(xiσ兹苶2苶π)

Figure 2.2 (a) Normal distribution (b) Log-normal distribution

Instead of a whole population or a greater sample from the population, if a sample is taken from the whole population sample (e.g. N 20), the standard deviation of a sample, which is shown with s, is expressed as

s

冪莦

(2.14)

Since in most analytical experiments the number of measurements are 20, the cal- culated degrees of freedom will be decreased by 1, or N−1 is used instead of Nin Equation (2.14).

When two or ksets of measurements have been combined into a single lot, it is possible to calculate the standard deviation of the total distribution from the standard deviation values of the two or more distributions. The pooled standard deviation,sp, is given by

sp

冪莦莦莦莦

(2.15)

A very useful measure of precision is coefficient of variation, CV, or percent relative standard deviation, RSD:

RSDCV 100 (2.16)

The CV gives the percentage and it is a ratio, which is independent of the units of measurement, therefore it is very useful in comparing the variability of a set of data measured under different conditions. A smaller svalue or a leaner distribution, or even better way of expressing smaller RSD value is the indication of higher preci- sion for a set.

2.5 Confidence Limit, Confidence Interval and Confidence Level

In most of the analyses, the data collected are limited with small number of meas- urements and the calculated mean , differ from the true mean,µ. The precision can be deducted from a series of replicate analyses and by calculating the mean. The next question is then how close is the calculated mean to the true value, which cannot be measured easily. The true mean can be derived from the measured mean within a degree of probability. This limit of probability is called the confidence limit. The interval defined by this limit is the confidence interval. The confidence limit, there- fore, has to be calculated statistically from the measured mean and standard devia- tion within a confidence level as described below.

The normal curve in Figure 2.1 shows the distribution of measurements for a large number of data. The width of the curve is determined by σ, and true mean is close to the arithmetical mean within an error estimated from Equation (2.1). This is the sample data that can be used to determine a predicted range, confidence interval, for

s x

(xi1 1)2 (xi2 2)2 … (xik k)2 (N1 N2Nk)k

(xi )2 N 1

16 Chapter 2

the true mean. It can only be stated that to a degree of certainty the population mean or true mean lies somewhere in that range. For example, as stated above, the true mean is in the range of 1σ with probability of 68.26%; in the range of 1.3490σ with a probability of 82.26%, etc. Therefore, different portions of areas under the normal curve can be related to a parameter,zvalues or zscores to make it possible to predict the range for true mean µ, for a selected degree of certainty. The proba- bility of prediction is called the confidence level and the coefficient indicates the z scores. The values of zfor different confidence levels are given in Table 2.1.

The relation between true mean,µand sample mean,will be

µxx¯ (2.17)

2.6 Student’s t Distribution: Confidence Limit for Small Number of Measurements

When the numbers of data decrease below about 20, the normal curve can no longer be accurately used to describe the distribution of the sample mean. In this case, a dif- ferent family of curves, which becomes broader with a decrease of sample numbers, is used. These curves are called tcurves and show normal curve characteristics. The shape of any tcurve depends on the degree of freedom (df), which is in most cases equal to the number of measurements Nminus one (dfN1). For large degrees of freedom, if N20,t curve becomes the normal curve and zscores can be used.

Table 2.2 shows the tscores for different confidence levels. The predicted range for the population means,µ, from standard deviation of sand mean,, will be

µ ts (2.18)

Nzσ 兹N

Table 2.1 Confidence level for z scores

Confidence level z Score

50 0.68

60 0.84

65 0.94

68 1.00

70 1.04

75 1.15

50 1.29

85 1.44

90 1.64

95 1.96

96 2.00

99 2.58

99.7 3.00

99.9 3.29

2.7 Testing for Statistical Hypothesis

The above-discussed distributions can be applied to a number of experimental results in order to obtain more meaningful mean, compare experimental results, reject out- liers, and compare standard deviations and other variables.

2.7.1 Comparison of Experimental Means with True Value or with Each Other: Student’s t Test

An important statistical application is to estimate the agreement between experi- mental results with a true value or test result of the sample with standard sample. If Table 2.2 t scores for various levels of confidence

Level of confidence (%)

Degree of 50 90 95 99 99.9

freedom

1 3.08 6.31 12.71 63.66 636.6

2 1.89 2.92 4.30 9.93 31.60

3 1.64 2.35 3.18 5.84 12.94

4 1.53 2.13 2.78 4.60 8.61

5 1.48 2.02 2.57 4.03 6.86

6 1.44 1.94 2.45 3.71 5.96

7 1.42 1.90 2.37 3.50 5.41

8 1.40 1.86 2.31 3.36 5.04

9 1.38 1.83 2.26 3.25 4.78

10 1.37 1.81 2.23 3.17 4.59

11 1.36 1.80 2.20 3.11 4.44

12 1.36 1.78 2.18 3.06 4.32

13 1.35 1.77 2.16 3.01 4.22

14 1.35 1.76 2.15 2.98 4.14

15 1.34 1.75 2.13 2.95 4.07

16 1.34 1.75 2.12 2.92 4.02

17 1.33 1.74 2.11 2.90 3.97

18 1.33 1.73 2.10 2.88 3.92

19 1.33 1.73 2.09 2.86 3.88

20 1.33 1.73 2.09 2.85 3.85

21 1.32 1.72 2.08 2.83 3.82

22 1.32 1.72 2.07 2.82 3.79

23 1.32 1.71 2.07 2.81 3.77

24 1.32 1.71 2.06 2.80 3.75

25 1.32 1.71 2.06 2.79 3.73

26 1.32 1.71 2.06 2.78 3.71

27 1.31 1.70 2.05 2.77 3.69

28 1.31 1.70 2.05 2.76 3.67

29 1.31 1.70 2.05 2.75 3.65

30 1.31 1.70 2.04 2.75 3.65

40 1.30 1.68 2.02 2.70 3.55

60 1.30 1.67 2.00 2.66 3.46

120 1.26 1.66 1.98 2.62 3.73

1.28 1.65 1.96 2.58 3.29

18 Chapter 2 sis known from the earlier experiments, then confidence limits can be calculated for a given confidence level by

xiµts (2.19)

Similarly, confidence limit for a mean value can be found for N experimental results by

µ (2.20)

Two sets of experimental data obtained for the same sample by different methods or under various experimental conditions can statistically be compared within a confi- dence interval using zor tdistribution tests depending on the size of data. If the repli- cate number of measurements are N1and N2for the first and the second experimental data with means of 1and,2the true mean for each set will be

µ11 (2.21)

µ22 (2.22)

The null hypothesis can be applied to estimate the difference between the two means. This hypothesis states that there is no significant difference between two population means, and that the difference between sample means is a consequence of random errors only. Therefore, when µ1µ2and pooled standard deviation,sp, is used, the difference between means can be expressed as

|12|tsp

冪 莦

(2.23)

The interpretation of the data can be made by comparing the difference of means with quantity on the right-hand side of Equation (2.23) at the desired confidence level. The t value is taken at a selected confidence level of a degree of freedom, N1N22. If |12| tsp

冪 莦

, the difference between means is not significant. Otherwise a significant error is indicated at the given confidence level, which indicates the presence of a systematic error.

2.7.2 Comparison of Two Experimental Standard Deviations:

Dalam dokumen Trace Element Analysis of Food and Diet (Halaman 30-36)