• Tidak ada hasil yang ditemukan

Outliers in regression

Major topics covered in this chapter

Uncertainty 99 to each, and then combines these components using the rules summarised in

5.15 Outliers in regression

Outliers in regression 149 this method of linear interpolation between successive points gives x-values of 1.36, 4.50 and 8.65 units respectively. Comparison with the above table shows that these results, especially the last two, would be quite acceptable for many purposes.

150 5: Calibration methods in instrumental analysis: regression and correlation

y^i

y^i

y^i yiyi^

y^i 0

0

0

0 (a)

(b)

(c)

(d)

Figure 5.15 Residual plots in regression diagnosis: (a) satisfactory distribution of residuals;

(b) the residuals tend to grow as yigrows, suggesting that a weighted regression plot would be suitable; (c) the residuals show a trend, first becoming more negative, then passing through zero, and then becoming more positive as yiincreases, suggesting that a (different) curve should be plotted; and (d) a satisfactory plot, except that y6might be an outlier.

flagged.) Several more advanced methods have been developed, of which the best known is the estimation for each point of Cook’s squared distance, CD2 (some-times abbreviated to ‘Cook’s distance’), first proposed in 1977. This is an example of an influence function, i.e. it measures the effect that rejecting the calibration point in question would have on the regression coefficients. For a straight line graph it can be calculated from:

(5.15.1)

In this equation is a predicted y-value obtained when all the data points are used, and is the corresponding predicted y-value obtained when the ith point is omit-ted: is calculated using all the data points. Values of CD2greater than 1 justify the omission of the suspect point.

In practice the Cook’s squared distance method turns out to be better at identify-ing some types of outlier than others: outliers in the middle of a data set are less read-ily detected than those at the extremes. However, the alternative non-parametric and robust methods can be very effective in handling outliers in regression: robust regression methods have proved particularly popular in recent years. These topics are covered in the next chapter.

s2y>x

yNj(i) yNj

CD2 = a

n

j = 1

A

yNyj - yNyj(i)

B

2

2s2y>x

Exercises 151

Glucose concentration, mM 0 2 4 6 8 10

Absorbance 0.002 0.150 0.294 0.434 0.570 0.704

Concentration, ng ml-1 0 5 10 15 20 25 30

Absorbance 0.003 0.127 0.251 0.390 0.498 0.625 0.763

Bibliography

Analytical Methods Committee, Royal Society of Chemistry, Cambridge. This body publishes a series of Technical Briefs on several aspects of regression and calibra-tion methods, including weighted regression, errors and confidence limits, stan-dard additions, etc. Along with associated software and datasets these short papers can be downloaded from www.rsc.org.

Draper, N.R. and Smith, H., 1998, Applied Regression Analysis, 3rd edn, John Wiley, New York. An established work with comprehensive coverage of many aspects of regression and correlation problems.

Kleinbaum, D.G., Kupper, L.L. and Muller, K.E. 2007. Applied Regression Analysis and other Multivariable Methods, 4th edn, Duxbury Press, Boston, MA. Extensive treat-ment of regression problems and the applications of ANOVA.

Mark, H. and Workman Jr, J., 2008, Chemometrics in Spectroscopy, Academic Press, London. A major work, with a substantial emphasis on calibration and regression methods. Also covers basic statistics, experimental designs and collaborative studies.

Snedecor, G.M. and Cochran, W.G., 1989, Statistical Methods, 8th edn, Iowa State University Press, Ames, IA. Gives an excellent general account of regression and correlation procedures.

Distance from polarograph, m 1.4 3.8 7.5 10.2 11.7 15.0 Mercury concentration, ng g-1 2.4 2.5 1.3 1.3 0.7 1.2 Examine the possibility that the mercury contamination arose from the polarograph.

2 The response of a colorimetric test for glucose was checked with the aid of stan-dard glucose solutions. Determine the correlation coefficient from the following data and comment on the result.

Determine the slope and intercept of the calibration plot, and their confidence limits.

3 The following results were obtained when each of a series of standard silver solutions was analysed by flame atomic-absorption spectrometry.

Exercises

1 In a laboratory containing polarographic equipment, six samples of dust were taken at various distances from the polarograph and the mercury content of each sample was determined. The following results were obtained.

152 5: Calibration methods in instrumental analysis: regression and correlation

Gold added, ng per ml of concentrated

sample 0 10 20 30 40 50 60 70

Absorbance 0.257 0.314 0.364 0.413 0.468 0.528 0.574 0.635

Concentration, ng ml1 0 10 20 30 40 50

Fluorescence intensity 4 22 44 60 75 104

(arbitrary units) 3 20 46 63 81 109

4 21 45 60 79 107

5 22 44 63 78 101

4 21 44 63 77 105

4 Using the data of exercise 3, estimate the confidence limits for the silver concen-trations in (a) a sample giving an absorbance of 0.456 in a single determination, and (b) a sample giving absorbance values of 0.308, 0.314, 0.347 and 0.312 in four separate analyses.

5 Estimate the limit of detection of the silver analysis from the data in exercise 3.

6 The gold content of a concentrated seawater sample was determined by using atomic-absorption spectrometry with the method of standard additions. The results obtained were as follows.

Sample: 1 2 3 4 5 6 7 8 9 10

Sulphide (ISE method): 108 12 152 3 106 11 128 12 160 128 Sulphide (gravimetry): 105 16 113 0 108 11 141 11 182 118

Lead concentration, ng ml-1 10 25 50 100 200 300

Absorbance 0.05 0.17 0.32 0.60 1.07 1.40

Estimate the concentration of the gold in the concentrated seawater, and deter-mine confidence limits for this concentration.

7 The fluorescence of each of a series of acidic solutions of quinine was deter-mined five times. The results are given below.

Determine the slopes and intercepts of the unweighted and weighted regression lines. Calculate, using both regression lines, the confidence limits for the con-centrations of solutions with fluorescence intensities of 15 and 90 units.

8 An ion-selective electrode (ISE) determination of sulphide from sulphate reducing bacteria was compared with a gravimetric determination. The results obtained were expressed in milligrams of sulphide.

Comment on the suitability of the ISE method for this sulphide determination.

(Al-Hitti, I.K., Moody, G.J. and Thomas, J.D.R, 1983, Analyst, 108: 43)

9 In the determination of lead in aqueous solution by electrochemical atomic-absorption spectrometry with graphite-probe atomisation, the following results were obtained:

Investigate the linear calibration range of this experiment.

(Based on Giri, S.K., Shields, C.K., Littlejohn D. and Ottaway, J.M., 1983, Analyst, 108: 244)

Exercises 153

Fluorescence intensity 36 69 184 235 269 301 327

Spermine, ng 6 18 30 45 60 75 90

Use these data to determine the slopes and intercepts of two separate straight lines. Estimate their intersection point and its standard deviation, thus deter-mining the composition of the DPA–europium complex formed.

(Based on Arnaud, N., Vaquer, E. and Georges, J., 1998, Analyst, 123: 261) 11 In an experiment to determine hydrolysable tannins in plants by absorption

spectroscopy the following results were obtained:

Absorbance 0.008 0.014 0.024 0.034 0.042 0.050 0.055 0.065

DPA : Eu 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6

Absorbance 0.068 0.076 0.077 0.073 0.066 0.063 0.058

DPA : Eu 1.8 2.0 2.4 2.8 3.2 3.6 4.0

Absorbance 0.084 0.183 0.326 0.464 0.643

Concentration, mg ml1 0.123 0.288 0.562 0.921 1.420 Use a suitable statistics or spreadsheet program to calculate a quadratic relation-ship between absorbance and concentration. Using R2and R 2values, comment on whether the data would be better described by a cubic equation.

(Based on Willis, R.B. and Allen, P.R., 1998, Analyst, 123: 435)

12 The following results were obtained in an experiment to determine spermine by high-performance thin layer chromatography of one of its fluorescent derivatives:

Determine the best polynomial calibration curve through these points.

(Based on Linares, R.M., Ayala, J.H., Afonso, A.M. and Gonzalez, V., 1998, Analyst, 123: 725)

10 In a study of the complex formed between europium (III) ions and pyridine-2, 6-dicarboxylic acid (DPA), the absorbance values of solutions containing different DPA : Eu concentrations were determined, with the following results:

6