Linear Regression Analysis - The F Test - Trace Element Analysis of Food and Diet

The F Test

2.9 Linear Regression Analysis

22 Chapter 2 The t criterion: For the series of nine values (except 26),x¯19.3 and s1.5. At 95% confidence level, for 102 8 degrees of freedom, t_α 2.31. So from Equation (2.22),

t_exp 4.24

Therefore the value of 26 should be rejected.

The R criterion: Again for N10, at 95% confidence level,R3.54. Then 4.46. So the experimental result 26 should again be rejected.

As seen, all of the three criterions gave the same result, namely, the value of 26 should be rejected. On borderline cases, one may not obtain the same conclusion from the entire criterion. Then the experimenter should make the decision, or better take all the data for further calculations.

where mis the slope of the line and bthe intercept. Note that one should assign xto the more accurately known parameter and assume that all of the error is associated with y. In order to find the equation of a line,mand bhave to be determined by the least-squares method,i.e. by minimizing the sum of the squares of the deviations of the points from the regression line.

The sum of the squares of the deviations, SSD, for each point from the line is

SSD [y_i(mx_ib)]² (2.28)

Taking the partial derivatives of this expression with respect to mand b, and equat- ing to zero, the following equations can be obtained for npairs of values:

nbmx_i y_i (2.29)

bx_imx_i² x_iy_i (2.30)

From the simultaneous solution of Equations (2.29) and (2.30), m and bcan be found as

m (2.31)

b (2.32)

One can also determine uncertainties of the parameters:

σb² σm² (2.33)

where

σ² [y_i(mx_ib)]^2/n2 (2.34)

∆n∑x_i²∑x_i² (2.35)

After the linear regression curve is determined, the degree of relationship between the random variables (x_i,y_i) can be calculated. This value is called the correlation coefficient and is usually denoted by r. The formula for correlation coefficient is

r (2.36)

The correlation coefficient rcan have any value between l, an exact positive correlation, and l, a perfect negative correlation, and zero for no correlation.

Fluctuations of yarise from two sources: variations of xwith which yis correlated via the regression line, and fluctuations about the regression line, which may arise

m兹苶苶^x苶i2苶苶ⁿ^(x苶苶⁾² 兹苶y苶苶i2苶苶n(y苶苶)² mσx

σy

nσ² ∆ σ²x_i²

∆ y_imx_i n x_iy_i (x_iy_i)/n x²_i (x_i)²/n

24 Chapter 2 from experimental error in the measurement of yvalues or from dependence on other parameters not included in the regression.

One important aspect of writing the expression for ras we have in the first part of Equation (2.36) is that r²represents the fraction of the variance of y,σy2, that results from variance of x. The greater the slope of the regression line, and the greater the variance of x,σx2, the greater the value of rwill be, and, thus the fraction of σy2

explained by variation of x. The portion of σy2accounted for by error or dependence on other parameters is lr².

One must be careful about the correlation between two quantities. Correlation coefficients are rarely zero, as a small correlation nearly always arises just by chance. The smaller the value of n, the smaller is the given value of r. If there are just two points, one obviously obtains a perfect correlation. As in all statistical events, it is impossible to conclude that two parameters are absolutely correlated or not. Instead it is common to quote the probability P(r,N) that the observed correlation arose purely by chance. As shown in Figure 2.3, if there are only 10 pairs of points,rmust be 0.54 to reduce the probability of a random correlation to a value smaller than 10%. However, if N50 and r0.54, the chance that the correlation arose from random fluctuations drops to less than 0.1%. Usually, one should place little confidence in a correlation unless the Pvalue is 1%. In the literature, one may see such statements as: “the variables are correlated to the 1% confidence level”, meaning that P 0.01.

It is very difficult to establish cause–effect relationships from correlation. The fact that xcorrelates with ydoes not necessarily mean that xdetermines yor vice versa.

Figure 2.3 The linear correlation coefficient r vs. the number of observations N and the cor- responding probability P(r,N)

The variations in both of them might be caused by some other variable zthat is not included in the analysis. The concentrations of most elements in the total diet corre- late significantly with the concentrations of most other elements. The main reason for this is that concentrations of most types of foods are all influenced strongly by few main staple foods like flour, vegetables and meat.

The correlation coefficients between the concentrations of elements give informa- tion about the sources of elements.¹The correlation coefficients calculated for total diet samples²are given in Table 2.6. There is a strong correlation between sodium and chlorine; this is because the main source of these elements is the added salt, NaCl.

There are also high correlation between Mg and Cr, Cl, Mn, Fe, Se and Co. Wolnik et al.³and Aras and Kumpulainen⁴showed that Mg, Mn, Zn, Se, Cr and V are the main minor and trace elements in wheat. Positive correlation between potassium and calcium and zinc indicate that they have a common source, namely vegetables.

2.9.1 Multiple Linear Regression

Often there is a possibility that a dependent variable is suspected of being dependent on several other variables. Multiple linear regressions can be used to test the dependence of one variable upon several others. For example, an element in total diet might be originating from different sources, such as water, several staple foods, spices.

Multiple linear regression assumes that one can express the independent variable ya₀a_lx_la₂x₂…a_nx_n (2.37) The coefficients a₀,a_l,…,a_nare determined by minimizing the squares of the deviations of the y_ivalues from the plane defined by Equation (2.39). Determinations of the values as they involve some rather complex matrix methods are covered in Chapters 7 and 9 of Ref. 5.⁵

Table 2.6 Correlation coefficient between elements in a total diet

Na Mg CI K Ca Cr Mn Fe Zn Se Br Rb Sc Co

Na 1 Mg 0.61 1 CI 0.99 0.58 1 K 0.21 0.05 0.22 1 Ca 0.21 0.08 0.04 0.63 1 Cr 0.12 0.67 0.11 0.31 0.18 1 Mn 0.4 0.55 0.380.360.49 0.29 1 Fe 0.38 0.55 0.37 0.48 0.40 0.540.0l I Zn 0.14 0.02 0.16 0.78 0.59 0.140.54 0.66 1 Se 0.23 0.16 0.25 0.43 0.240.070.26 0.33 0.66 1 Br 0.38 0.53 0.42 0.45 0.70 0.520.04 0.70 0.46 0.04 1 Rb 0.32 0.31 0.34 0.81 0.62 0.440.31 0.90 0.90 0.41 0.72 1 Sc 0.48 0.69 0.48 0.13 0.15 0.59 0.13 0.16 0.16 0.01 0.58 0.42 1 Co 0.47 0.84 0.45 0.160.15 0.77 0.50 0.14 0.14 0.06 0.45 0.45 0.67 1

26 Chapter 2

Dalam dokumen Trace Element Analysis of Food and Diet (Halaman 40-44)