Introduction to curvilinear regression methods

Major topics covered in this chapter

Uncertainty 99 to each, and then combines these components using the rules summarised in

5.13 Introduction to curvilinear regression methods

In many instrumental analysis methods the instrument response is proportional to the analyte concentration over substantial concentration ranges. The simplified calcu-lations that result encourage analysts to take significant experimental precautions to achieve such linearity. Examples of such precautions include the control of the emis-sion line width of a hollow-cathode lamp in atomic absorption spectrometry, and the size and positioning of the sample cell to minimise inner filter artefacts in molecular fluorescence spectrometry. However, many analytical methods (e.g. immunoassays and similar competitive binding assays) produce calibration plots that are intrinsically curved. Particularly common is the situation where the calibration plot is linear (or approximately so) at low analyte concentrations, but becomes curved at higher ana-lyte levels. When curved calibration plots are obtained we still need answers to the questions listed in Section 5.2, but those questions will pose rather more formidable statistical problems than occur in linear calibration experiments.

The first question to be examined is, how do we detect curvature in a calibration plot? That is, how do we distinguish between a plot that is best fitted by a straight line, and one that is best fitted by a gentle curve? Since the degree of curvature may be small, and/or occur over only part of the plot, this is not a straightforward ques-tion. Moreover, despite its widespread use for testing the goodness of fit of linear graphs, the product–moment correlation coefficient (r) is of little value in testing for curvature: we have seen (Section 5.3) that lines with obvious curvature may still give very high r values. An analyst would naturally hope that any test for curvature could be applied fairly easily in routine work without extensive calculations. Several such tests are available, based on the use of the y-residuals on the calibration plot.

We have seen (Section 5.5) that a y-residual, , represents the difference be-tween an experimental value of y and the -value calculated from the regression equation at the same value of x. If a linear calibration plot is appropriate, and if the random errors in the y-values are normally distributed, the residuals themselves should be normally distributed about the value of zero. If this turns out not to be true in practice, then we must suspect that the fitted regression line is not of the cor-rect type. In the worked example given in Section 5.5 the y-residuals were shown to be 0.58, 0.38, 0.24, 0.50, 0.34, 0.18 and 0.02. These values sum to zero

y_i - yNi

R¿² = 1 - (residual MS/total MS)

R² = SS due to regression/total SS = 1 - (residual SS/total SS)

Introduction to curvilinear regression methods 143 (allowing for possible rounding errors, this must always be true), and are approxi-mately symmetrically distributed about 0. Although it is impossible to be certain, es-pecially with such small numbers of data points, that these residuals are normally distributed, there is certainly no contrary evidence in this case, i.e. no evidence to support a non-linear calibration plot. As previously noted, Minitab^®, Excel^® and other statistics packages provide extensive information, including graphical dis-plays, on the sizes and distribution of residuals.

A second test uses the signs of the residuals given above. As we move along the calibration plot, i.e. as x increases, positive and negative residuals will be expected to occur in random order if the data are well fitted by a straight line. If, in contrast, we attempt to fit a straight line to a series of points that actually lie on a smooth curve, then the signs of the residuals will no longer have a random order, but will occur in sequences of positive and negative values. Examining again the residuals given above, we find that the order of signs is - - - . To test whether these se-quences of and - residuals indicate the need for a non-linear regression line, we need to know the probability that such an order could occur by chance. Such calcu-lations are described in the next chapter. Unfortunately the small number of data points makes it quite likely that these and other sequences could indeed occur by chance, so any conclusions drawn must be treated with caution. The choice between straight line and curvilinear regression methods is therefore probably best made by using the curve-fitting techniques outlined in the next section.

In the common situation where a calibration plot is linear at lower concentra-tions and curved at higher ones, it is important to be able to establish the range over which linearity can be assumed. Approaches to this problem are outlined in the fol-lowing example.

Example 5.13.1

Investigate the linear calibration range of the following fluorescence experiment.

Fluorescence intensity 0.1 8.0 15.7 24.2 31.5 33.0

Concentration, g ml^-1 0 2 4 6 8 10

Inspection of the data shows that the part of the graph near the origin corre-sponds rather closely to a straight line with a near-zero intercept and a slope of about 4. The fluorescence of the 10 g ml^-1 standard solution is clearly lower than would be expected on this basis, and there is some possibility that the de-parture from linearity has also affected the fluorescence of the 8 g ml^-1 stan-dard. We first apply (unweighted) linear regression calculations to all the data.

Application of the methods of Sections 5.3 and 5.4 gives the results a 1.357, b  3.479 and r 0.9878. Again we recall that the high value for r may be deceptive, though it may be used in a comparative sense (see below). The y-residuals are found to be 1.257, 0.314, 0.429, 1.971, 2.314 and 3.143, with the sum of squares of the residuals equal to 20.981. The trend in the values of the residu-als suggests that the last value in the table is probably outside the linear range.

We confirm this suspicion by applying the linear regression equations to the first five points only. This gives a 0.100, b 3.950 and r 0.9998. These slope

144 5: Calibration methods in instrumental analysis: regression and correlation

0 4 8 10

Concentration, μgml^–1

Fluorescence

2 6

Figure 5.14 Curvilinear regression: identification of the linear range. The data in Example 5.13.1 are used; the unweighted linear regression lines through all the points (—), through the first five points ( ) and through the first four points only (....) are shown.

and intercept values are much closer to those expected for the part of the graph closest to the origin, and the r-value is higher than in the first calculation. The residuals of the first five points from this second regression equation are 0, 0,

0.2, 0.4 and 0.2, with a sum of squares of only 0.24. Use of the second re-gression equation shows that the fluorescence expected from a 10 g ml^-¹ stan-dard is 39.6, i.e. the residual is 6.6. Use of a t-test (Chapter 3) would show that this last residual is significantly greater than the average of the other residuals:

alternatively a test could be applied (Section 3.7) to demonstrate that it is an

‘outlier’ amongst the residuals (see also Section 5.15 below). In this example, such calculations are hardly necessary: the enormous residual for the last point, coupled with the very low residuals for the other five points and the greatly re-duced sum of squares, confirms that the linear range of the method does not extend as far as 10 g ml^-¹. Having established that the last data point can be excluded from the linear range, we can repeat the process to study the point (8, 31.5). We do this by calculating the regression line for only the first four points in the table, with the results a 0, b 4.00, r 0.9998. The correlation coefficient value suggests that this line is about as good a fit of the points as the previous one, in which five points were used. The residuals for this third calcu-lation are0.1, 0, 0.3 and 0.2, with a sum of squares of 0.14. With this cali-bration line the y-residual for the 8 g ml^-¹solution is 0.5: this value is larger than the other residuals but probably not by a significant amount. It can thus be concluded that it is reasonably safe to include the point (8, 31.5) within the linear range of the method. In making a marginal decision of this kind, the analytical chemist will take into account the accuracy required in their results, and the reduced value of a method for which the calibration range is very short.

The calculations described above are summarised in Fig. 5.14. It will be seen that the lines calculated for the first four points and the first five points are in practice almost indistinguishable.

Curve fitting 145 Once a decision has been taken that a set of calibration points cannot be satisfacto-rily fitted by a straight line, the analyst can play one further card before using the more complex curvilinear regression calculations. It may be possible to transform the data so that a non-linear relationship is changed into a linear one. Such transforma-tions are regularly applied to the results of certain analytical methods. For example, modern software packages for the interpretation of immunoassay data frequently offer a choice of transformations: commonly used methods involve plotting log y and/or log x instead of y and x, or the use of logit functions (logit x ln [x/(l x)]).

Such transformations may also affect the nature of the errors at different points on the calibration plot. Suppose, for example, that in a set of data of the form y px^q, the sizes of the random errors in y are independent of x. Any transformation of the data into linear form by taking logarithms will obviously produce data in which the errors in log y are not independent of log x. In this case, and in any other instance where the expected form of the equation is known from theoretical considerations or from longstanding experience, it is possible to apply weighted regression equa-tions (Section 5.10) to the transformed data. It may be shown that, if data of the general form y f(x) are transformed into the linear equation Y BX A, the weighting factor, w, used in Eqs (5.10.1)–(5.10.4) is obtained from the relationship:

(5.13.1)

Unfortunately, there are not many cases in analytical chemistry where the exact mathematical form of a non-linear regression equation is known with certainty (see below), so this approach may not be very valuable.

In contrast to the situation described in the previous paragraph, experimental data can sometimes be transformed so that they can be treated by unweighted methods.

Data of the form y bx with y-direction errors strongly dependent on x are some-times subjected to a log–log transformation: the errors in log y then vary less seriously with log x, so the transformed data can reasonably be studied by unweighted regres-sion equations.

Dalam dokumen Statistics and Chemometrics for Analytical Chemistry (Halaman 159-162)