• Tidak ada hasil yang ditemukan

Directory UMM :Data Elmu:jurnal:A:Agricultural Systems:Vol65.Issue1.Jul2000:

N/A
N/A
Protected

Academic year: 2017

Membagikan "Directory UMM :Data Elmu:jurnal:A:Agricultural Systems:Vol65.Issue1.Jul2000:"

Copied!
13
0
0

Teks penuh

(1)

Comparing genetic coecient estimation

methods using the CERES-Maize model

E. RomaÂn-Paoli

a,

*, S.M. Welch

b

, R.L. Vanderlip

b aAgricultural Experiment Station, University of Puerto Rico, Lajas, PR 00667, Puerto Rico

bDepartment of Agronomy, Kansas State University, Manhattan, KS 66502, USA

Received 13 July 1999; received in revised form 25 May 2000; accepted 3 June 2000

Abstract

Many crop simulation models use genetic coecients to characterize varieties or hybrids. Two methods now used with CERES-Maize to obtain genetic coecients are: (1) direct experimental measurement; and (2) estimation using the Genetic Coecient Calculator (GENCALC), an iterative computerized procedure. The objective of this research was to compare an adaptation of the Uniform Covering by Probabilistic Region (UCPR) method with these two approaches. UCPR delineates a joint con®dence region for the parameters corresponding to a goodness-of-®t threshold level. The study focuses on two genetic coe-cients, duration of the juvenile phase (P1) and photoperiod sensitivity (P2), for ®ve maize hybrids. Field experiments were conducted at Rossville, KS, during 1995 in which genetic coecients of four of the hybrids were determined. Silking date data for the same hybrids were obtained from the Kansas Corn Performance Tests for use in estimating coecients with UCPR and GENCALC. UCPR was better than GENCALC at minimizing squared error but at the cost of much longer run times. Both estimation procedures underestimated P1 relative to the ®eld data. This may have resulted from the model's propensity to overestimate leaf number. An independent set of silking date data for B73Mo17 from the Kansas Corn Per-formance Tests was used for comparing methods. Simulated silking dates using P1 and P2 values obtained by UCPR and GENCALC accounted for only 26 and 47%, respectively, of the variability in actual dates. Both underestimated longer durations to silking. Use of pub-lished values for P1 and P2 accounted for 45% of variability but underestimated all data (bias

ÿ9.5 days).#2000 Published by Elsevier Science Ltd.

Keywords:Parameter; Search; Con®dence region; Corn; Goodness-of-®t

0308-521X/00/$ - see front matter#2000 Published by Elsevier Science Ltd. P I I : S 0 3 0 8 - 5 2 1 X ( 0 0 ) 0 0 0 2 4 - X

www.elsevier.com/locate/agsy

* Corresponding author.

(2)

1. Introduction

With suitable inputs, crop simulation models allow us to extrapolate across dif-ferent conditions and places (Thornton et al., 1991). Parameters in a simulation model should have a physical or biological meaning. Those parameters can be measured in independent experiments or estimated from observed data. The process of measuring parameters in a real system, however, may be complex or impractical, which may result in some level of uncertainty concerning accuracy of the estimated values.

Crop growth simulation models often are complex and nonlinear. Therefore, the use of traditional statistical methods to estimate parameters may not be appropriate (Klepper and Rouse, 1991). A new method, Uniform Covering by Probabilistic Region (UCPR), has been proposed to estimate parameters for non-linear models from observed data (Klepper and Hendrix, 1994). The advantage of this technique over the iterative computerized procedure Genetic Coecient Cal-culator (GENCALC; Hunt et al., 1993) is that it provides both parameter esti-mates and a joint con®dence region for the parameters. The con®dence region may have an arbitrary shape; it need not be ellipsoidal, as is common with standard nonlinear regression methods (e.g. Kmenta, 1971). GENCALC and ®eld estimation provide only point estimates of each coecient with associated individual con-®dence limits.

Many crop growth models use the concept of genetic coecients to characterize varieties or hybrids (Ritchie et al., 1989). Genetic coecients are sets of parameters that describe the genotypeenvironment interaction (IBSNAT, 1993). They sum-marize quantitatively how a particular cultivar responds to environmental factors. Depending on the coecient, estimation involves ®eld or growth chamber studies, many samples, and/or exposure to di€erent photoperiods. CERES-Maize (Jones and Kiniry, 1986) is a growth and development model of maize (Zea maysL.) which simulates phenology, growth, and yield using soil, climatic, and management inputs. New maize hybrids are released each year, thereby increasing the need for new CERES-Maize genetic coecients. These can be measured by direct experiment (Jones and Kiniry; Ogoshi et al., 1991) or, after the fact, estimated by GENCALC from outcome data using the simulation model (Hunt et al.,1993).

(3)

2. Materials and methods

Genetic coecients for four corn hybrids (GH-2404, GH-2573, NC+ 4616, ICI-8599) were estimated using UCPR and GENCALC. A more detailed study of the two methods was also conducted using the hybrid B73Mo17. Field data for com-parison were obtained from plantings described below. Silking date data for the computer methods were extracted from records of the Kansas Corn Performance Tests (Roozeboom, 1993). The number of performance test observations varied by hybrid (Table 1). Performance test data represent a range of photoperiod and man-agement conditions. B73Mo17 was included because (1) a large amount of data is available, and (2) P1 and P2 estimates have been published (Jones and Kiniry, 1986), against which our results could be compared.

2.1. Field data

Hybrids GH-2404, GH-2573, NC+ 4616, and ICI-8599 were planted on 10 May 1995, at Rossville, KS. The experimental design was a randomized complete block with four replications. The soil was a Eudora silt loam (coarse silty, mixed, mesic, ¯uventic Haplustoll). Plant population was 3.4 plants mÿ2; row width was 0.76 m. Growing conditions were managed to attain optimal conditions. Irrigation was applied to ensure adequate water. Plant samples were observed periodically under a stereoscope for tassel initiation (Ritchie et al., 1992). Date of tassel initiation was also observed for GH-2404 and GH-2573 in 1996 experiments planted on 13 June at Rossville and 4 June at Powhattan, KS. Temperature data were obtained from automatic weather stations at each location. Growing degree days (GDD8) were calculated using the method of Gilmore and Rogers (1958).

The plantings just described did not provide the range of photoperiods necessary to obtain independent estimates of P1 and P2. We estimated P1 (under the assump-tion that P2=0) by summing GDD8from emergence to 4 days before tassel initia-tion (Jones and Kiniry, 1986). Climate records were used to ®nd all other combinations of P1 and P2 consistent with the observed tassel initiation.

2.2. UCPR

Plots of multivariate con®dence regions universally delimit them with closed curves, often ellipses. Ellipses result from simplifying mathematical assumptions which date from a time when only in®nitesimal computing power was available (Rao, 1965). It has been shown, however, that regions derived in this fashion can dramatically misrepresent the probable location of nonlinear model parameters (Donaldson and Schnabel, 1987). Nevertheless, these methods are still widely incorporated in statistical software. Nonelliptical curves can be produced to bound con®dence regions but, as dimensionality increases, the computer power required becomes prohibitive even by modern standards.

(4)

Table 1

Corn performance test data used for coecient estimation

Year Location GH-2404 ICI-8599 NC+4616 GH-2573a B72Mo17

1979 Colby x

a Additional data from Rossville and Powhattan in 1996 were used.

(5)

display them. Instead of computing the border, they search for the interior. In parti-cular, they plot a cloud of points whose extent well approximates that of the con-®dence region. A uniformity principle is used to ensure that no subregions are missed. LetGminsymbolize the minimum of a sum of squares goodness-of-®t criterion and letGcbe the sum of squares along the boundary of an con®dence region. Given Gmin,Gccan be calculated by:

whereKis the sample size,n is the number of parameters, andFis the correspond-ing FisherF-value (Draper and Smith, 1966). Readers should note that Klepper and Hendrix (1994) omitted the ``1'' in this equation which gives impossible results. Our interpretation is that this represents a typographical error.

Klepper and Hendrix (1994) used a novel iterative random procedure to seek a combination of parameter values with a minimal sum of squares (Gmin). Along the way, combinations were retained if they represented a new minimum or if their goodness-of-®t was less than the currentGc. These latter combinations delineate the interior of the con®dence region. If a new minimum is found,Gcis recalculated and all points are rechecked to see if they are still appropriate to include in a revised estimate of the region. To characterize a two-parameter con®dence region, a scatter of 200 parameter combinations is recommended.

We wrote a C-language computer program including the UCPR algorithm. The program can be summarized as follows:

1. the user enters the desired number of (P1, P2) pairs in the con®dence region and theF-value and sets the limits on P1 and P2;

2. the computer randomly generates candidate (P1, P2) pairs;

3. the CERES-Maize model is run for each observation (K) and each (P1, P2) pair (N);

4. each model run produces an output ®le containing predicted and measured silking dates;

5. the sum of squares for each candidate pair (N) is calculated fromKmodel runs (one per observation) by summing the square of the di€erence between CERES-Maize predicted and measured silking dates;

6. the best (P1, P2) pair produces the ®rst estimate of Gmin and a correspond-ingGc;

7. a new candidate pair is generated at random and its sum of squares computed; a. if the new point is better than the worst previous point, the new point

replaces it; otherwise the new point is dropped;

b. if the new point's sum of squares is <Gc, it is accepted into the con®dence region; and

(6)

8. the program repeats step 7; it stops when it has found a set ofN pairs with sums of squares less thanGc.

More details of the method are given in Klepper and Hendrix (1994).

2.3. GENCALC

GENCALC is software designed to estimate genetic coecients for models in the Decision Support System for Agrotechnology Transfer (DSSAT) system, which includes CERES-Maize (Hunt et al., 1993). As a parameter estimation method, GENCALC needs an initial value for each parameter to begin calculations. It then ®ts a genetic coecient to each of the observed values provided. The algorithm searches the output ®le and, based on the di€erence between predicted and actual values, deci-des whether to increase or decrease the coecient being considered. The ®rst round of calculations is set in the GCRULES ®le, which can be changed by the user. The next round is determined internally doing an extrapolation to determine what change in a coecient would eliminate the error between predicted and measured values for each observation (Hunt, 1997). When GENCALC ®nds a good ®t to each observation, it averages the coecients and calculates the coecient of variation (CV). Based on the new candidate parameters, the user repeats the process. The search ®nishes when the user accepts the parameters based on a low CV. The same observations were used for GENCALC as with UCPR. Several runs with di€erent initial values of P1 and P2 were done, and those with the lowest CVs were chosen as the best estimates.

2.4. Comparison of methods

Because maize was grown under a narrow range of photoperiods, P1 and P2 could not be estimated independently of each other. However, the set of all (P1, P2) com-binations consistent with the ®eld observations was determined by using values of P2 from 0 to 2 to calculate possible values of P1. This set was then compared graphically to the point estimates and con®dence limits obtained from the computer methods.

Estimates of P1 and P2 for B73Mo17 obtained from UCPR and GENCALC and published values (Jones and Kiniry, 1986) were used to simulate silking data for 22 experiments from the 1996 and 1997 Kansas Corn Performance Tests not used in estimating the coecients (Table 1). These included a broad range of environmental conditions. Predicted silking dates were plotted against measured silking dates and compared by regression analysis.

3. Results and discussion

3.1. Field data

(7)

averaged, yielding estimates of 277 and 305 GDD8 for GH-2404 and GH-2573, respectively. Tassel initiation occurred at a photoperiod of nearly 15 h, 2.5 h longer than the 12.5-h threshold. According to climatic data, daily heat unit accumulation was approximately 16 GDD8 per day. Thus, the e€ect of P2 could vary from 0 GDD8for a hybrid lacking photoperiod sensitivity (P2=0) to 5 days or approxi-mately 80 GDD8for one with an atypically high value of P2=2.0 days hÿ1.

3.2. UCPR

Among the four hybrids with the smaller data sets, the best ®ts were obtained for GH-2404 and ICI-8599, and the worst for GH-2573 (Table 3). ICI-8599 was the Table 2

Dates of emergence, tassel initiation, silking and associated daylength and P1 values

Hybrid Emergence date Tassel initiation Silking date P1a

Date Daylength

Rossville 1995

GH-2404 20 May 20 June 14.93 18 July 307

ICI-8599 19 May 20 June 14.93 18 July 307

NC+4616 19 May 23 June 14.92 23 July 355

GH-2573 19 May 23 June 14.92 24 July 355

Rossville 1996

GH-2404 17 June 4 July 14.85 31 July 245

GH-2573 17 June 5 July 14.83 7 August 264

Powhattan 1996

GH-2404 9 June 29 June 14.95 28 July 267

GH-2573 9 June 30 June 14.93 1 August 284

a Assuming P2=0 (see text).

Table 3

Values for P1 and P2 for ®ve maize hybrids estimated by UCPR and GENCALC using silking date as the objective function

Hybrid No. of observations UCPR GENCALC

Gmina Gcb P1 P2 RMSEc P1 (CV) P2 (CV)

B73Mo17d 41 694 807 290 0.13 4.4 269 (4) 1.01 (48)

GH-2404 9 120 266 238 0.44 3.6 231 (3) 1.33 (44)

ICI-8599 9 118 261 221 1.58 3.6 251 (3) 0.80 (68)

NC+ 4616 12 192 341 259 0.37 4.0 251 (6) 1.01 (47)

GH-2573 8 188 467 291 0.25 4.8 260 (10) 0.92 (61)

a Minimum sum of squares. b Goodness-of-®t threshold. c Root mean square error.

(8)

only hybrid with a P2 value greater than 1, indicating that it was highly sensitive to photoperiod. Early maturity hybrids (GH-2404 and ICI-8599) had lower P1 values than later hybrids.

The results for B73Mo17 were quite interesting. When UCPR was run using N=200 points, the algorithm produced the estimates shown in Table 3 and the con®dence region delineated by the scatter plot in Fig. 1A. However, the behavior was markedly di€erent whenNwas increased to 400 points. The estimates changed to P1=307 and P2=0 accompanied by a drop in the sum of squared residuals from 694 to 607. The algorithm was not able to ®nd 400 points within the small region bounded by the corresponding Gc of 707, even after an extended run time. The points it did ®nd yielded the scatter diagram in Fig. 1B.

To investigate this, the sums of squares were computed over a grid of 2500 pairs of (P1, P2) values in a narrow region containing the new estimates. A graph of this response surface (not shown) shows that a rapid drop in the sum of squares occurred at P1=307 when P2 reached 0. The greatly expanded P2 scale shows what amounts to a discontinuity in model behavior. Additional plots (not shown) revealed the cause. Because the model has a daily time-step, a ®nite change in the phenological parameters is needed to produce any change in predicted silking dates and, there-fore, in the sum of squares response surface. The lack of model sensitivity to P2 exacerbates this e€ect, yielding a stepped surface with parallel discontinuities nearly (but not exactly) orthogonal to the P2 axis. The interaction of this e€ect with the roughly parabolic nature of the surface in the P1 direction results in the minimum at (307, 0). From a biological standpoint, this minimum can only be classed as an artifact of model formulation.

3.3. GENCALC

Table 3 shows the results of the runs with CVs low enough to be deemed accept-able. Values for P1 varied from 231 to 269, all with CVs less than or equal to 10%. The estimated values for P2 were relatively high (0.80±1.33) in comparison with the results (P2=0.52) given by Jones and Kiniry (1986). P2 estimates also had high CVs (47±68%). This undoubtedly re¯ects the low sensitivity of CERES-Maize to P2 (RomaÂn-Paoli, 1997).

3.4. Comparison of methods

Fig. 2 directly compares the three estimation methods as applied to four hybrids. The scatter plots delineate the con®dence regions as computed by UCPR. The cor-responding CVs have been converted to 95% con®dence limits and displayed as error bars. The solid lines show all (P1, P2) pairs consistent with the ®eld measure-ments. In particular, the P1 intercepts are the values in Table 2 (averaged in the cases of GH-2404 and GH-2573).

(9)

GENCALC's and strongly suggests that the methods are not converging to the same point estimates. This cannot be con®rmed because of the limited number of silking date observations used in estimating the coecients; the UCPR con®dence limits are broad enough to include the GENCALC estimates. However, the B73Mo17 data resulted in con®dence limits tight enough to con®rm the suggestion. The GENCALC estimates for that hybrid (Table 3) fell well outside the regions deli-neated in Fig. 1.

Fig. 1. Con®dence region for P1 and P2 at 95% probability level for the hybrid B73Mo17 using observed silking date as the objective function: (A) 200 points and estimatedGminby UCPR; (B) true

(10)

Fig. 2 also shows that the locus of (P1, P2) pairs consistent with the ®eld observa-tions often marginally overlaps or lies above the con®dence limits of both computer-based methods. A possible cause was suggested by previous results indicating that CERES-Maize simulations of these same experiments overestimated ®nal leaf number (RomaÂn-Paoli, 1997). Without compensation, the presence of extra leaves would incorrectly delay silking. Given model insensitivity to P2, the only way for either optimization method to force a ®t to observed silking dates was to arti®cially reduce P1.

The regression of simulated B73Mo17 silking dates on observed values (Fig. 3) was tested for equality of slopes and intercepts, which showed no signi®cant di€er-ences among methods. In each case, the intercepts were signi®cantly greater than 0, and the slopes were less than 1. The tendencies were to overpredict early silking date and underpredict later ones. The published values for P1 and P2 (Jones and Kiniry, 1986) underestimated silking date (bias ofÿ9.5 days). This is not surprising,

(11)

because the published P1 value is less than either the UCPR or GENCALC values and falls below the con®dence regions outlined in Fig. 1. Root mean squared error (RMSE) of all methods and the biases of UCPR and GENCALC were comparable (Fig. 3).

These results suggest that use of local data to determine the P1 and P2 coecients may improve predictability. The equality of slopes of the lines indicate that similar coecients are determined by all three methods and result in a model less sensitive to environmental conditions than real maize plants.

The algorithm used in UCPR to minimize squared error is a member of a class of methods known as random search procedures. These algorithms are extremely powerful. Indeed, some members of this class have been proven to be globally con-vergent (Pronzato et al., 1984; Lundy and Mees, 1986); i.e. in the limit of long-search time, the probability of ®nding a best possible estimate approaches 1. GEN-CALC, in contrast, is much weaker. Indeed, with the B73Mo17 data,Gmin from GENCALC was 962, well above the values of 694 and 607 from the two UCPR runs. As noted above, these di€erences were statistically signi®cant.

GENCALC can produce only individual con®dence limits for parameter esti-mates. However, UCPR can outline the actual joint con®dence region. This region may have a spatial pattern quite di€erent from what the individual con®dence limits suggest (Klepper and Hendrix, 1994). Of course, GENCALC can ®t any number of parameters, whereas the UCPR con®dence region scatter plots become dicult to interpret with more than three parameters.

(12)

The UCPR approach has two other disadvantages. First, the run times can be much longer than those for GENCALC, as much as several days. Of course, the utility of a reliable set of parameters would seem to outweigh this one-time cost. Secondly, the procedure is so powerful, that, if pressed, it can respond to even min-ute model artifacts should they happen to depress the sums of squares. This problem can be reduced greatly by response surface mapping about the estimates to detect and/or diagnose problems as was done here.

Parameter estimation for complex crop models will remain an art rather than a science for some time to come. We believe that the ability of the UCPR method to produce realistic joint con®dence regions along with better point estimates make it superior to GENCALC for problems involving three parameters or fewer. Better methods of displaying its scatter diagram outputs may be able to extend this utility to higher dimensional problems. No matter which algorithm is used, however, we recommend response surface mapping about the point estimates to characterize the nature of the optimum that has been found.

Of course, no computerized method can overcome ¯aws in the model, which only ®eld tests can reveal. Such tests should not focus just on ultimate model predictions such as yield or key phenological dates (e.g. silking, maturity). Spot checking of individual parameters can be quite important. Furthermore, as illustrated here, parameters that are confounded observationally can still be used to test models. The joint con®dence regions of UCPR are particularly useful in this regard. Therefore, powerful, carefully evaluated estimation procedures combined with ongoing ®eld tests seem to provide the recipe for long-term progress with models.

Acknowledgment

Contribution No. 99-521-J. Research partially supported by the Kansas Corn Commission.

References

Donaldson, J.R., Schnabel, R.B., 1987. Computational experience with con®dence regions and con®dence intervals for nonlinear least squares. Technometrics 29, 67±82.

Draper, N.R., Smith, H., 1966. Applied Regression Analysis. John Wiley, New York.

Gilmore, E., Rogers, J.S., 1958. Heat units as a method of measuring maturity in corn. Agron. J. 50, 611±615.

Hanks, J., Ritchie, J.T. (Eds.), 1991. Modeling Plant and Soil Systems (Agronomy-no. 31). ASA, CSSA, SSSA, Madison, WI.

Hunt, L.A., 1997. How GENCALC generates candidate points? Electronic communication.

Hunt, L.A., Pararajasingham, S., Jones, J.W., Hoogenboom, G., Imamura, D.T., Ogoshi, R.M., 1993. GENCALC: software to facilitate the use of crop models for analyzing ®eld experiments. Agron. J. 85, 1090±1094.

(13)

Jones, C.A., Kiniry, J.R. (Eds.), 1986. CERES-Maize, a Simulation Model of Maize Growth and Devel-opment. Texas A&M University Press, College Station, TX.

Kiniry, J.R., Ritchie, J.T., Musser, R.L., 1983a. Dynamic nature of the photoperiodic response in maize. Agron. J. 75, 700±703.

Kiniry, J.R., Ritchie, J.T., Musser, R.L., Flint, E.P., Iwig, W.L., 1983b. The photoperiod sensitive inter-val in maize. Agron. J. 75, 687±690.

Klepper, O., Rouse, D.I., 1991. A procedure to reduce parameter uncertainty for complex models by comparison with real system output illustrated in a potato growth model. Agricultural Systems 36, 375± 395.

Klepper, O., Hendrix, E.M.T., 1994. A comparison of algorithms for global characterization of con-®dence region for nonlinear models. Environ. Toxicol. Chem. 13, 1887±1899.

Kmenta, J., 1971. Elements of Econometrics. The Macmillian Company, New York.

Lundy, M., Mees, A., 1986. Convergence of an annealing algorithm. Math. Programming 34, 111±124. Ogoshi, R.M., Hunt, L.A., Jones, J.W., Tsuji, G.Y. 1991. Determination and application of ®eld genetic

coecients for CERES-Maize crop model. Agron. Abs. P21.

Pronzato, L., Venot, W.E., Lebruchec, J.F., 1984. A general purpose global optimizer: implementation and applications. Math. Comput. Simul. 26, 412±422.

Rao, C.R., 1965. Statistical Inference and its Applications. John Wiley, New York.

Ritchie, J.T., Godwin, D.C., Singh, U., 1989. The CERES models of crop growth and yield. Agron. Abs. P21.

Ritchie, S.W., Hanway, J.J., Benson, G.O., 1992. How a Corn Plant Develops (Special Report No. 48). Cooperative Extension Service., Iowa State University Ames, IA.

RomaÂn-Paoli, E., 1997. Maize performance in Kansas: a CERES-Maize simulation. PhD thesis, Kansas State University, Manhattan, KS (Diss. Abstr. AAC 9804375).

Roozeboom, K.L., 1993. Kansas Performance Tests with Corn Hybrids (Report of Progress 696). Kansas Agricultural Experimental Station, Manhattan, KS.

Thornton, P.K., Dent, J.B., Bacsi, Z., 1991. A framework for crop growth simulation model applications. Agricultural Systems 37, 327±340.

Referensi

Dokumen terkait

Pengadaan Kursi Kerja untuk Kelengkapan Block Office Pemerintah Kota Batu pada. satker Bagian perlengkapan Setda Kota Batu dengan nilai HPS

Hasil utama yang diperoleh dari tandan buah sawit ialah minyak sawit yang terdapat pada daging buah (mesokarp) dan minyak inti sawit yang terdapat pada

[r]

4.3.1 Dengan disediakan peralatan gambar, peserta didik dapat mengatur huruf dan angka sesuai dengan klasifikasinyaE. 4.3.2 Dengan disediakan peralatan gambar, peserta

All HIV- infected pregnant women attending PMTCT clinics in these health facilities during the study period and who consented to participate in the study were enrolled,

Formulir DPPA SKPD RINGKASAN DOKUMEN PELAKSANAAN PERUBAHAN ANGGARAN.. SATUAN KERJA

yang mempengaruhi perilaku konsumen dalam pengambilan keputusan.. pembelian,

Menurut Hossein Nashr ada beberapa hal yang menjadi (kesamaan) titik temu antara Islam dan Kristen, pertama, bahwa antara Muslim dan Kristen sama- sama dikaruniai iman