• Tidak ada hasil yang ditemukan

Directory UMM :Data Elmu:jurnal:A:Agricultural Systems:Vol64.Issue1.Apr2000:

N/A
N/A
Protected

Academic year: 2017

Membagikan "Directory UMM :Data Elmu:jurnal:A:Agricultural Systems:Vol64.Issue1.Apr2000:"

Copied!
17
0
0

Teks penuh

(1)

Statistical methods for evaluating a crop

nitrogen simulation model, N_ABLE

J. Yang

a,

*, D.J. Greenwood

b

, D.L. Rowell

a

,

G.A. Wadsworth

a

, I.G. Burns

b

aDepartment of Soil Science, The University of Reading, Whiteknights, PO Box 233, Reading RG6 6DW, UK bDepartment of Soil and Environment Sciences, Horticulture Research International, Wellesbourne,

Warwick, CV35 9EF, UK

Abstract

Modelling nitrogen (N) dynamics is a valuable tool to predict the uptake, mobility and leaching of mineral N in soil pro®les. Many crop/soil N simulation models have been devel-oped in the last 20 years for this purpose. However, methods for the operational evaluation of simulation models are not well established. Standard test statistics such as theF- andt-tests are being questioned because they may give di€erent conclusions. Di€erence measures are being used as alternatives but they have not been thoroughly investigated. This paper reviews statistical methods that may be helpful in comparing simulations with measured variables. They have been used to analyse comparisons of simulations produced by a nitrogen response model, N_ABLE, with measurements made in a ®eld experiment with lettuce. The techniques include: (1) data transformation, (2) regression analysis and (3) analysis of di€erence. They show that accuracy of prediction for di€erent variables varied in di€erent growth periods. Mean absolute errors (MAE) were within 0.5% and 35 kg haÿ1of the measured values of

per-cent-N and soil mineral N, respectively, over the whole growth period. They also demonstrate systematic errors when crop weights exceeded 2 t haÿ1, with 0±0.38 t haÿ1under-estimation of

dry weight and 3±16 kg haÿ1under-estimation of N uptake when the harvest date was set at

values between 42 and 61 days. Use of regression analysis showed that the data sets all vio-lated normality and equal variance assumptions and that choosing the right transformation before analysis was crucially important. When di€erence measures were used, it was found that there was strong correlation between the outcome of some, but not others. Overall, the results suggest that the tests can be grouped according to the degree of correlation between individual tests and that only one test needs to be made from each correlated group. We suggest that two sets of four statistics can be used, each statistic explaining a special property of the data. Each set leads to useful conclusions. The ®rst set is mean of error(E), root mean square error, modi®ed forecasting eciency and pairedt-statistic, and the second set isE, MAE, forecasting coecient andF-ratio of lack of ®t over experimental error (FLF(Y=X)). Either

0308-521X/00/$ - see front matter#2000 Elsevier Science Ltd. All rights reserved. P I I : S 0 3 0 8 - 5 2 1 X ( 0 0 ) 0 0 0 1 0 - X

www.elsevier.com/locate/agsy

(2)

set can give the same conclusions which could not be quantitatively detected by graphical inspection of the experimental data.#2000 Elsevier Science Ltd. All rights reserved.

Keywords: Statistical evaluation; Test statistics; Di€erence measures; Nitrogen simulation model, N_ABLE; Nitrogen uptake; Soil mineral N

1. Introduction

Operational evaluation is the assessment of accuracy and precision of a simulation model and methods vary from inspection and demonstration to analytical test (Knepell and Arangno, 1993). In general, what we most require to know is what proportion of the treatment variation, i.e. that excluding experimental error, can be accounted for by the model. We also need to know about biases the model has over the entire response surface, i.e. in this case yield against N fertiliser rate and time. At a more detailed level it may be helpful for us to know: (1) In what regions of the response surface are the discrepancies between simulated and measured variables greatest and over what regions are they smallest? (2) Does the model generally give the right shape of response surface? Do the simulated values simply di€er from the measured values by a ®xed amount or are they a ®xed proportion of the measured values? (3) Does the model give transformed or un-transformed values that are lin-early related to the measured values?

The purpose of this paper is to discuss a range of statistical techniques for answering these questions. Two types of statistical methods have commonly been used in model evaluation: test statistics and di€erence measures. Test statistics (both parametric and non-parametric) have been reviewed by Reckhow et al. (1990) and O'Leary and Con-nor (1996), and comprehensive discussions on di€erence measures have been given by Willmott et al. (1985), Loague and Green (1991) and Kabat et al. (1995). It seems that there is no robust statistic which can be used for all models because of the complexity of the data structure of a model, e.g. continuous or discrete, error or error-free and one- or two-dimensional. In this paper, we try to select some available statistical methods both from test statistics and from di€erence measures for the evaluation of the model, N_ABLE (Greenwood and Draycott, 1989a, b; Greenwood et al. 1996) using four measured data sets from a nitrogen experiment with lettuce.

2. Data transformation

2.1. Experiment and data structure

(3)

mineral N on a total of seven dates on Days 12, 19, 29, 42, 50, 57 and 61 in each plot during the growth period. Four state variables were measured regularly during the experiment in each of six N treatments with lettuce (Yang et al., 1999). The state variables are: weight of dry matter excluding ®brous roots (WDM, t haÿ1), soil mineral N in the 0±30 cm layer (NS, kg haÿ1), N uptake excluding ®brous roots (NU, kg haÿ1) and percent N in dry matter (PN,%).

The above data were simulated with the model N_ABLE. The model calculates the daily changes in the distributions of mineral-N and water down the soil pro®le together with the increments in plant dry weight and N-uptake. The major inputs to the model are level and timing of fertiliser application, the maximum potential yield of dry matter, and the daily rainfall and potential evaporation together with the initial soil characteristics. Detailed values of the input parameters were given by Yang et al. (1999). It was found that output is sensitive to the harvest time para-meter (Th), i.e. the date on which the simulation is stopped. So the e€ect of changing Thwas examined in the simulation by settingThto the sample Days 42, 50, 57 or 61, producing four data sets for each of the four variables.

The following notation is used in the data analysis later. A measured variable is represented byYand a simulated variable byX, while measured data are denoted by yij(i=1, 2. . .nrepresents the total measurements on each data set (see later exam-ple) and j=1, 2 . . . rrepresents replicates) and simulated data by xi, because the model cannot simulate random experimental errors for each replicate. Then the di€erence between simulation and measurement can be de®ned as:

DˆYÿX …1†

dijˆyijÿxi iˆ1;2. . .n; jˆ1;2. . .r

(4)

yieldWmax=4.3 at the ®nal harvestTh=61 days. Thus there is an overlap in the mea-sured but not the simulated values for the di€erent times. The magnitude of variables can be in¯uenced by N levels and by the duration of growth. Graphs of simulatedNU andPNand measured data against time of growth are shown in Figs. 1 and 2. Graphs of measuredWDMandNSwere shown in a previous paper (Yang et al., 1999).

2.2. Test of normality and homoskedasticity

Regression methods are frequently used in model testing and validation processes (Sutherland et al., 1986; Reckhow et al., 1990; O'Leary and Connor, 1996) and provided one of the methods used in this paper (see later for details). In the linear regression model:

Yiˆ‡Xi‡"i …2†

the t-test can be used to test H0: a=0, b=1 sinceta ˆ …aÿ†=Sa t (Nÿ2) and

tb=(bÿ)/Sbt (Nÿ2), where a and b are least squares measures of and

(Aigner, 1971). However, for the classical least squares estimation in Eq. (2), the signi®cant test of the regression parameters assumes that the individual error terms eiare normally distributed, are of equal variance, are independent of each other, and

(5)

are not correlated with the independent variable, Xi. In our research, transforma-tions are generally required because the range in response variables is large. For example, the largest values ofWDM, NUand NS are all 10 or 100 times more than the smallest values. Diagnostic plots forWDM, NU,PN andNSproduced evidence that these data sets seriously violated the assumption of equal variance, and slightly violated the assumption of normality (Yang, 1999). Violations of these two assumptions may cause biased estimates of the parameter variances in the regression analysis. Shapiro and Wilk'sW-test is considered to be the best multi-functional test of normality (Shapiro and Wilk, 1965; Royston, 1982), whereas White'sw2-test was thought to be a good statistic to check heteroskedasticity ofeisince it was derived from a null hypothesis that not only are the errors homoskedastic, but they are also independent of the regressors (White, 1980).

2.3. Transformation for normality and homoskedasticity

(6)

standard linear model shown in Eq. (2), there is a need to stabilise error variances. Two frequently used transformation methods were chosen for use in this paper based on the methods discussed by Pindyck (1981) and Bowker (1993). They are:

1. ModelYiˆ‡Xi‡"iwas transformed by 1/Si, and the model becomes:

Yi=Siˆ…1=Si† ‡…Xi=Si† ‡"i=Si …3†

whereSiis the sample standard derivation, i.e.Siˆp‰…yijÿyi†2=…rÿ1†Š, andyiis the mean value of theith measured data. Eq. (3) shows thatSiwill be measured with experimental errors, meaning that the greater the number of the replicates, the more stable the standard errors. from diagnostic plots that residual erroreiis proportionally related to the indepen-dent variable Xi/2. Eqs. (3) and (4) were ®tted by the weighted regression method using SAS software (SAS Institute, 1989, 1990a, b).

To select the best transformation,Wandw2statistics of random errors in Eqs. (3) and (4) were ®rst calculated for each data set atTh=42, 50, 57 and 61 days indivi-dually, and then mean values ofWand w2for di€erent Thvalues were compared. The ideal transformation is one where both the minimum w2and the maximumW occurred simultaneously, but this condition was only roughly met by theNSdata set (Table 1). When this condition did not hold, we identi®ed the minimum value ofw2 as our selected transformation. Following this rule, we concluded that the 1/Xi transformation gave the best correction for equal variance inWDM,NUandPNdata sets with slight loss of normality, and 1/Xi0.5gave the best correction for both nor-mality and equal variance inNSdata sets (Table 1).

2.4. Transformation of normality of data setD(YÿX)

(7)

3. Regression analysis

3.1. Testing of the null hypothesis…H0†:ˆ0; ˆ1

Linear regression was therefore carried out using Eq. (2) with the weighted variable 1/XiforWDM,NUandPNdata sets, and with the weighted variable 1/Xi

0.5forNSdata

set, and thet-test used to test H0:=0,=1 in the model. In our data,N=nr, anda andbare weighted least squares measures ofand. Calculatedt-statistics are:

taˆ …ÿ0†=sa saˆp…MSE=nr‡x2s2b† …5†

tbˆ …bÿ1†=sb sbˆp‰MSE=…xiÿx†2Š …6†

Table 1

Comparison of di€erent transformations of data for testing normality and homoskedasticity using mean values (Th=42, 50, 57 and 61) ofWand White-w2statistics for each state variablea

State

WDM No-transfer 0.9118 (0.00) 21.25 (0.00)

1/Si 0.8838 (0.00) 10.06 (0.03)

PN No-transfer 0.9711 (0.26) 11.49 (0.01)

1/Si 0.9756 (0.40) 6.36 (0.17)

NS No-transfer 0.9571 (0.13) 9.78 (0.01)

(8)

where MSE is the mean square of experimental error, calculated as shown later. Signi®cance of the values of coecientsaandbwere tested by comparingtaandtb with the signi®cance level oft0.05(nrÿ2) ort0.01(nrÿ2).

3.2. Lack of ®t test

In regression analysis with replicates of dependent variables, the lack of ®t test was carried out by theFratio of MSLFover MSE. Values of both mean squares were obtained as follows. The sums of squares of residuals (SSR) were partitioned into two parts, the sums of squares of the randomised error (SSE) and the sums of squares of the lack of ®t (SSLF). The degrees of freedom of residuals (DFR) were then partitioned into the degree of freedom of randomised error (DFE) and the degree of freedom of lack of ®t (DFLF), where:

SSRˆSSE‡SSLF DFRˆDFE‡DFLF

SSRˆ…yijÿaÿbxi†2 DFRˆ …nrÿ2†

SSEˆ…yijÿyi†2 DFEˆn…rÿ1†

SSLFˆSSRÿSSE DFLFˆ …nÿ2†

…7†

MSLFˆ …SSLF=DFLF†

MSEˆ …SSE=DFE† …8†

and whereyiis the measured mean of theith measurement. The variance ratio is:

FLFˆMSLF=MSE …9†

Statistical inferences from the above analysis of variance of residuals were drawn by comparingFLFwith the signi®cance level ofF0.05(DFLF, DFE) orF0.01(DFLF, DFE).

3.3. Goodness of ®t test

R2is a commonly used statistic for testing the goodness of ®t and is de®ned as:

R2ˆSSU=SSYˆ1ÿ …SSR=SSY† …10†

Following the earlier method, the results of regression with a lack of ®t test on simulated and measured data from the four state variables are listed in Table 2. Graphical displays ofYagainstXand regression lines forNUandPNare shown in Fig. 3 and 4.

4. Analysis of di€erence

(9)

al., 1985). In model testing, however, our interest is in the model's accuracy, i.e. the extent to which simulated values approach the measured data. This can be achieved by examining the di€erence directly from dij=yijÿxi. It is equal to the test of the goodness of ®t of modelY=X, but notY=a+bX. In this circumstance, the regres-sion method with a=0 or b=1 uses the following equation to calculate R2, e.g. R2=1

ÿ(residual sum of squares/uncorrected total sum of squares) (SAS Institute, 1990b), to avoid situations whereR2>1 orR2<0 (Aigner, 1971; Pindyck, 1981).

4.1. Test statistics

Two di€erent statistics were used for this propose:

4.1.1. Paired t-statistic

In testing simulated data against experimental data, the pairedt-statistic was used to test the null hypothesis d…xiÿyi† ˆ0 (Reckhow et al., 1990). In our data, it is calculated as follows:

paired tˆd=sd …11†

wheredis the mean value for the di€erence variable shown in Eq. (1), andsd is the standard error of the mean. It is important that the di€erence variable D (YÿX) should be normally distributed and be independent without considering equal var-iance in the paired t-test (Snedecor and Cochran, 1976). D data sets used in the pairedt-test were transformed as discussed previously.

Table 2

a *,** Refer to 0.05 and 0.01 signi®cance levels. bW

(10)

4.1.2. F…LFYˆX† statistic

This still holds for the modelY=X(Whitmore, 1991), but the sums of squares of residuals SSR(Y=X) and FLF (Y=X) values are calculated di€erently from those in Eqs. (7) and (9):

SS…RYˆX†ˆ…yijÿxi†2 DF…RYˆX† ˆnr

SSLF…YˆX†ˆSSR…YˆX†ÿSSE DF…LFYˆX† ˆn

…12†

where SSEis the same as given in Eq. (7). The variance ratio is

FLF…YˆX†ˆ …SS…LFYˆX†=DFLF…YˆX†=…SSE=DFE† …13†

Calculated values of pairedtandFLF(Y=X)statistics using Eqs. (11) and (13) are recorded in Table 3, and used to test the di€erenceD(YÿX).

(11)

4.2. Di€erence measures

Several simple di€erence measures have been developed for this purpose by Loa-gue and Green (1991) and Willmott et al. (1985). Kabat et al. (1995) employed ®ve of them in evaluating soil nitrogen simulation models. These statistics have the fol-lowing features in common: (1) they measure di€erence, dijˆyijÿxi, in several ways, including the sum of di€erence, (yijÿxi), the sum of absolute di€erence,

jyijÿxij and the sum of squares of di€erence,(yijÿxi)2; and (2) they de®ne a statistical function ofdij as a measure of di€erence to examine model performance by characterising systematic under- or over-prediction. Bearing in mind these de®-nitions, here we discuss six statistics as our alternative indicators for further eva-luation of the N_ABLE model:

Mean error

Eˆ …1=nr†…yijÿxi† …14†

Fig. 4. Comparison of measured percent-N in dry matter against simulated values for each of the sam-pling dates, for simulations with di€erent harvest times; dashed lines areY=X, solid lines areY=a+bX;

(12)

Mean absolute error

Values of di€erence statistics (test statistics and di€erence measures) when assessing di€erenceD(YÿX) for each state variablea

42 3.91** 1.84 0.01 0.12 0.19 0.1318 0.9575 0.8447 72

50 4.14** 4.89** ÿ0.08 0.19 0.30 0.1429 0.9366 0.8238 90

57 15.86** 13.58** ÿ0.30 0.33 0.48 0.1917 0.8866 0.7437 108

61 23.31** 14.55** ÿ0.38 0.40 0.56 0.1976 0.8640 0.7088 126

NU

42 4.23** 0.03 ÿ3.12 6.34 9.91 0.1517 0.9397 0.8178 72

50 5.85** 5.85** ÿ7.00 9.41 14.31 0.1661 0.9137 0.7759 90

57 16.84** 11.08** ÿ13.43 14.13 20.84 0.2067 0.8498 0.6984 108

61 21.07** 12.09** ÿ15.64 16.86 23.83 0.2171 0.8197 0.6597 126

PN

42 9.28** 2.65** ÿ0.15 0.48 0.56 0.1003 0.2149 0.0487 72

50 5.88** 2.51** ÿ0.11 0.41 0.50 0.0887 0.5749 0.3233 90

57 4.83** 0.37 0.02 0.34 0.44 0.0787 0.7371 0.5022 108

61 3.77** 1.13 0.05 0.33 0.44 0.0774 0.7596 0.5539 126

NS

42 3.66** 4.66** ÿ23.7 34.4 47.88 0.2476 0.7285 0.5690 63e

50 3.91** 5.07** ÿ23.3 34.5 47.39 0.2542 0.7293 0.5659 80

57 2.78** 3.93** ÿ16.2 30.1 43.99 0.2313 0.7679 0.6205 97

61 4.74** 3.85** ÿ13.1 28.3 40.53 0.2287 0.7877 0.6211 115

a *,** Refer to 0.05 and 0.01 signi®cance levels. bForF

LF

(Y=X)calculation, data set have been transformed as in Table 1.

cFor pairedt-calculation, data setsDhave been transformed based on selections made as follows;Ddata from

WDM, andNUwere transformed by1/Xi;Ddata fromPNwere transformed by1/Xi0.5; andDdata from

NSwere transformed by1/Xi0.25.

dNo data transformation was made for calculation ofE, MAE, RMSE,C, EF and EF 1. eThere were missing values in the measuredN

(13)

Cooecient of error

Cˆ ‰…1=nr†jyijÿxijŠ=yˆMAE=y …19†

Eis an indicator of whether the model predictions tend to over- or under-estimate measured data (Addiscott and Whitmore, 1987). MAE and RMSE have been used by Willmott et al. (1985) to evaluate their models.E, MAE and RMSE all take on units ofdijˆyijÿxi and are mainly used as measures of accuracy to compare the output of the same variables, e.g. to compareWDM output of the model with mea-sured data for di€erent simulation dates, or to compare the output of the same variables among di€erent models.

EF is a relative measure of error used by many authors (Loague and Green, 1991) and preferred by Loague and Freeze (1985). It is de®ned as for R2 in regression analysis, but EFR2. EF=1 if y

ijˆxi and EF<1 for any realistic simulation. EF<0 if the model predicted values are worse than simply using the measured mean of yij, whereas corresponding values of R2 are 04R241. This is because EF is actually a statistic to test the goodness of ®t of the modelY=X, and thus has the restrictions a=0 and b=1 in the model Y=a+bX. Under these conditions, the partition of SSY into SSU and SSR no longer holds in general: SSY 6ˆSSU+SSR (Aigner, 1971). For this reason, we consider thatR2is not a good statistic in testing the goodness of ®t of the model Y=X. It is considered that quadratic residual functions, such as EF, are sensitive to outliers (Klepper and Rouse, 1991). To over-come this, we have de®ned another function EF1by replacing the sum of squares of di€erence with the sum of absolute di€erences [Eq. (18)], wherejEF1j4jEFj.

Cis another relative average measure of absolute di€erence which is expressed as a proportion of the mean of the measured variable,Y(Klepper and Rouse, 1991).C, EF and EF1are all dimensionless indices (C50, EF and EF141), and they have been used to depict the degree to which dijˆyijÿxi approaches the null set, i.e.

dij=0. They have also been used to compare accuracy of model outputs for di€erent variables. Detailed results of di€erence measures calculated using Eqs. (11), (13) and (14)±(19) are also listed in Table 3.

5. General discussion

It can be seen from Table 2 that most of the values ofFLFandaandbare statis-tically signi®cant at the 0.01 level, but the values are small, e.g.FLF< 9. In general, good ®ts forWDMandNUare obtained for early harvests (Days 42 and 50), and for PNat Days 57 and 61. There is only a small di€erence between the four lines forNS. The order of goodness of ®t can be ranked byR2asWDM>NU>NS>PN(Table 2), indicating that the poorest linear relationship detected is forPNwhenThis set to 42 days (Fig. 4a).

(14)

means of simulated and measured data. This conclusion is valuable because it can-not be deduced from a graphical display. It shows that all the di€erences between measured and simulated values in these regions of the response surfaces can be attributed to experimental error. All signi®cant values of the paired t and FLF(Y=X) values indicate that further modi®cation of the model is needed. This focuses atten-tion on regions of WDM and NU for Th=61 in the ®rst instance because of the higher values of bothFLF(Y=X)and pairedt-values.FLF(Y=X)values calculated in Table 3 were based on the model Y=X, indicating the extent to which the di€erences

(yijÿxi) di€er from experimental error. By comparison, the values ofFLFin Table 2 were based on the model Y=a+bX, indicating the extent to which the residual errors(yijÿaÿbxi) di€ered from experimental error. For this reason, theFLF(Y=X) values are the real measures of the accuracy of model simulation.

Most of the Evalues are negative (Table 3), indicating that the model generally underestimated the measured data except for PN at Days 57 and 61 and WDM at Day 42. The magnitude of underestimation increased with time for the WDM and NU variables, but the reverse is true for PN and NS variables. MAE and RMSE showed that the largest di€erence occurred forNUwhenTh=61.Cvalues indicate that the largest relative error is <25% for all variables, meeting the tolerance limit of 26% given by Klepper and Rouse (1991). The order of goodness of match among the four variables is WDM>NU>NS>PN ranked by values of EF and EF1(Table 3), giving a similar result to that drawn fromR2in Table 2.

It has been found that the di€erence statistics in Table 3 are strongly correlated with the outcome of some tests but not others. The tests can be grouped according to the degree of correlation (Table 4). Thus, RMSE, MAE and C were strongly correlated with one another, but not with any of the others. Likewise, pairedtand FLF(Y=X)were strongly correlated, but not with the other tests as were EF and EF

1. A single linkage dendrogram for the correlation matrix (rij>0) in Table 4, for eight di€erence statistics is shown in Fig. 5.

Table 4 shows that values of Ehave a strong negative correlation with C, MAE and RMSE. In this study, the negative correlations result from negative E values which are caused by X>Yin Eq. (1). In contrast, the other seven statistics are all

Table 4

Correlation coecient matrixrijfor eight di€erence statistics listed in Table 3a

FLF(Y=X) Pairedt E MAE RMSE C EF EF1

(15)

positive irrespective of whether the di€erenceD=(YÿX) is negative or positive. The correlations betweenEandC, MAE and RMSE are not universally applicable and for this reason the negative correlation has not been used in the cluster analysis in Fig. 5.

We conclude from Fig. 5 that only one statistic from each group needs be used to test the validation of the model. Group 1 includes RMSE, MAE andC(grouped byrij 50.82) and provides measures of the magnitude of the di€erence. Group 2 includes EF and EF1 (grouped byrij50.99), and gives measures the goodness of match or the co-ordinate of the di€erence. Group 3 includesFLF(Y=X)and pairedt-statistics (grouped byrij50.89) and gives a measure of the probability of signi®cant di€erence. Group 4 includes onlyEand provides a measure of direction of di€erence (i.e. under- or over-estimation). The maximum correlation coecient between Groups 1 and 2 is very small (i.e.rij=0.51), and between Groups 3 and 4 is only 0.07 (Table 4).

6. Conclusions

Data from the nitrogen experiment were shown to violate normality and equal variance assumptions and transformation is necessary to ensure standard statistical analyses are carried out correctly. Shapiro and Wilk's W-test and White's w2 test have been found to be useful methods for testing the normality and hetero-skedasticity of "i. Two types of transformations were employed in this study to ensure both normality and equal variance, and 1/Xiwas required for theWDM,NU andPNdata, and 1=X0:5

(16)

Reckhow et al. (1990) suggested that a predictionb0could be used for testingˆb0 but not for testingˆ1, whereb0can be decided from a real subject (i.e.b0=1,

jj>0 is an acceptable error). In addition, if thet-test shows that parametersand

do not deviate fromH0when regression analysis is carried out, R2can be used to check the goodness of ®t. Thus, if both of these tests are carried out, we can then draw a ®nal conclusion, whereas a decision based on any one of them can be misleading.

Di€erence measures shown in Eqs. (14)±(19) have advantages over test statistics in that they are easy to interpret and do not need data transformation, but the methods are not fully developed. Many authors believe that there is no robust statistic which can be used to draw conclusions in model evaluations and therefore several methods need to be used together to give a double check. In this study, we found that eight di€erence statistics (Table 3) were strongly correlated with each other allowing one to be chosen from each correlated group so as to save time without losing accuracy. We suggest that the same conclusion can be reached by using either RMSE, EF1, pairedtandEor MAE, EF,FLF(Y=X)andEas the di€erence measures because each of them is present in Groups 1, 2, 3 and 4 (Fig. 5). In addition, graphical display should be used as a visual aid for comparison between simulation and measurement and for interpretation of the statistics used in the testing and evaluation processes.

Acknowledgements

Financial support through an ORS award from UK is gratefully acknowledged. We also thank Andrew Mead from Horticulture Research International, UK, for his statistical comments on the ®rst draft of this paper.

References

Addiscott, T.M., Whitmore, A.P., 1987. Computer simulation of changes of soil mineral nitrogen and crop nitrogen during autumn, winter and spring. Journal of Agricultural Science, Cambridge 109, 141±157. Aigner, D.J., 1971. Basic Eeconometrics. Prentice-Hall, Englewood Cli€s, NJ.

Bowker, D.W., 1993. Dynamic models of homogeneous systems. In: Fry, J.C. (Ed.), Biological Data Analysis: A Practical Approach. Oxford University Press, Oxford, pp. 313±343.

Greenwood, D.J., Draycott, A., 1989a. Experimental validation of an N-response model for widely dif-ferent crops. Fertiliser Research 18, 153±174.

Greenwood, D.J., Draycott, A., 1989b. Quantitative relationships for growth and N content of di€erent vegetable crops grown with and without ample fertiliser-N on the same soil. Fertiliser Research 18, 175±188. Greenwood, D.J., Rahn, C.R., Draycott, A., Vaidyanathan, L.V., Paterson, C., 1996. Modelling and measurement of the e€ects of fertilizer-N and crop residue incorporation on N-dynamics in vegetable cropping. Soil Use and Management 12, 13±24.

Kabat, P., Marshall, B., van den Broek, B.J., 1995. Comparison of simulation results and evaluation of parameterization schemes. In: Katab, P., Marshall, B., Van den Broek, B.J., Vos, J., Van Keulen, H. (Eds.), Modelling and Parameterization of the Soil-plant-atmosphere System. A Comparison of Potato Growth Models. Wageningen Pers, Wageningen, pp. 439±501.

(17)

Knepell, P.L., Arangno, D.C., 1993. Simulation Validation: A Con®dence Assessment Methodology. IEEE Computer Society Press, Los Alamitos, CA.

Loague, K.M., Freeze, R.A., 1985. A comparison of rainfall-runo€ modelling techniques on small upland catchments. Water Resources Research 21, 329±348.

Loague, K., Green, R.E., 1991. Statistical and graphical methods for evaluating solute transport models: overview and application. Journal of Contaminant Hydrology 7, 51±73.

O'Leary, G.J., Connor, D.J., 1996. A simulation model of the wheat crop in response to water and nitrogen supply: II. Model validation. Agricultural Systems 52, 31±55.

Pindyck, R.S., 1981. Econometric Models and Economic Forecasts. McGraw-Hill Inc, USA.

Reckhow, K.H., Clements, J.T., Dodd, R.C., 1990. Statistical evaluation of mechanistic water-quality models. Journal of Environmental Engineering 116, 250±268.

Royston, J.P., 1982. An extension of Shapiro and Wilk'sWTest for Normality to large samples. Applied Statistics 31, 115±124.

SAS Institute Inc, 1989. SAS/STAT User Guide, Version 6, 4th Edition, Volume 2. SAS Institute, Cary, NC.

SAS Institute Inc, 1990aa. SAS Procedures Guide. Version 6, 3rd Edition. Cary, NC, USA.

SAS Institute Inc, 1990bb. SAS/ETS Software, Applications Guide 2, Version 6, First Edition: Econo-metric Modelling, Simulation, and Forecasting. Cary, NC, USA.

Shapiro, S.S., Wilk, M.B., 1965. An analysis of variance test for normality (complete samples). Biome-trika 52, 591±611.

Snedecor, G.W., Cochran, W.G., 1976. Statistical Methods, 6th Edition. The Iowa State University Press, Ames, IA, USA.

Sokal, R.R., Rohif, F.J., 1987. Introduction to Biostatistics, 2nd Edition. W.H. Freeman, New York. Sutherland, R.A., Wright, C.C., Verstraeten, L.M.J, Greenwood, D.J., 1986. The de®ciency of the

`eco-nomic optimum' application for evaluating models which predict crop yield response to nitrogen ferti-lizer. Fertilizer Research 10, 251±262.

White, H., 1980. A heteroskedasticity-consistent coveriance matrix estimator and a direct test for hetero-skedasticity. Econometrica 48, 817±829.

Whitmore, A.P., 1991. A method for assessing the goodness of computer simulation of soil processes. Journal of Soil Science 42, 289±299.

Willmott, C.J., Ackleson, S.G., Davis, R.E., Feddema, J.J., Klink, K.M., Legates, D.R., O'Donnell, J., Rowe, C.M., 1985. Statistics for the evaluation and comparison of models. Journal of Geophysical Research 90, 8995±9005.

Yang, J., 1999. Testing and evaluation of a nitrogen simulation model, N_ABLE, using independent ®eld data. PhD thesis, The University of Reading, Reading, UK

Referensi

Dokumen terkait

Hipotesis kedua yang menyatakan bahwa ada pengaruh tidak langsung evaluasi kinerja terhadap kepuasan kerja melalui variabel intervening kompensasi dan pengembangan

Dengan melihat kecenderungan loyalitas pelanggan yang telah dikemukakan sebelumnya, maka pelanggan yang loyal dapat memberikan kontribusi yang sangat besar

[r]

Tanjung,

Acara Aanw yjzing (Penjelasan Pekerjaan Prakualifikasi) dilakukan secara Online melalui website http:/ / lpse.bekasikota.go.id oleh para Peserta Lelang dengan Panitia Pengadaan

Berdasarkan Surat Penetapan Pemenang Nomor : 10/DYS.PK-ULPII/PAL/11/2015 tanggal Sembilan bulan Nopember tahun Dua ribu lima belas, bersama ini kami menyampaikan hasil

Hasil penelitian menunjukkan bahwa kemampuan mahasiswa PPL II Pendidikan Kimia dalam merencanakan pembelajaran berada dalam kategori cukup atau perlu perbaikan

[r]