• Tidak ada hasil yang ditemukan

APPLICATION OF THE PROPORTIONAL HAZARDS MODEL IN THE PROGNOSTIC ANALYSIS OF

N/A
N/A
Protected

Academic year: 2023

Membagikan "APPLICATION OF THE PROPORTIONAL HAZARDS MODEL IN THE PROGNOSTIC ANALYSIS OF "

Copied!
7
0
0

Teks penuh

(1)

APPLICATION OF THE PROPORTIONAL HAZARDS MODEL IN THE PROGNOSTIC ANALYSIS OF

COLO~

AND GASTROINTESTINAL CANCER DATA

B. D. Bunday*

University of Bradford Bradford, U. K.

and

v.

A. Kiri

University of Surrey Guildford, U. K.

(CEA) ~lb.roll ~I .J...,a.ll ...u~

­

(GGT) jl '*\*~ ~I; ~\;p.- lol:1;-1 ~J.r.J

­

~ ~ ~~

.rwJ1

V"'~ ~i ~\,:; ~ #i -uJ . ~I ~ ~ ~Ull ~

}I

)~i ~ \.Sjl.,:J~

~\';lb.r"

JJ

(CEA) ~J.r.J ~ ':"}~I .:"lb.r"

J

(GGT) ~J.r. ;;A ~I ~ - ~\.J...I \:;;lS'

J

~11..lA J ~;J

.

~'J\.J...I

o..lA

)~i ~'J.M.o ~ W~ i;i ~#i ~IJ

-

~r~I~' j~1

~) ~ \,;y~ ~I ~l:-\'I ~ ~ ~,.,...;aJ1 Ja.l...iJ ~ (( ~~)) ~~ ~lrll )~I ~lAS \.SJ.4 . ~I ;;A

e,r'

1.1b (~HI)~I ~~I ABSTRACT

Some likely prognostic factors of the survival rates of certain colon and gastrointestinal cancer patients are analyzed in two separate studies, each involving Cox's Proportional Hazards Model (PHM). Each study uses the measured values of two serum protein-types:

Carcinoembryonic Antigen (CEA) and Gamma-Glutamyl Transpeptidase (GGT), along with the respective ages of the patients, as the covariates under scrutiny. Our results reveal age as insignificant in both cases, while only GGT for the colon-cancer and CEA for that of the gastrointestine appear influential on the survival times of the respective patients involved. We illustrate the adequacy of the final forms of the model and highlight some of the reasons for its preference (over the parametric models) for this kind of analysis.

'" Address for correspondence:

Department of Mathematics University of Bradford

Bradford, West Yorkshire BD7 lOP United Kingdom

e-mail: [email protected]

January 1994 The Arabian Journal for Science and Engineering, Volume 19, Number 1.

85

(2)

APPLICATION OF THE PROPORTIONAL HAZARDS MODEL IN THE PROGNOSTIC ANALYSIS OF COLON AND GASTROINTESTINAL CANCER DATA

1. INTRODUCTION

It is widely acknowledged that the pre-treatment levels of a number of the serum proteins in cancer patients could be used as biochemical markers in screening, diagnosing, and monitoring the progress of certain types of this disease. Unfortunately, there is still no conclusive evidence in support of exact specification levels of such proteins for the fulfilment of these roles.

However, it is possible to use a combination of such significant factors to detect some types of cancer pre-operatively in symptomatic populations, as is the case with gastrointestinal cancer, De Mello et al. [1]. Indeed, considered together with other relevant symptom-factors, detection is also possible in asymptomatic populations, Marshall et al. [2] and Muller et al. [3].

The choice of the Proportional Hazards Model (PHM) for our analysis is a deliberate attempt to test its adequacy in this specific area of cancer study, since its use does not involve most of the assump­

tions necessary for the application of a parametric model. However, for purposes of comparison, a parametric analysis using a Weibull form for the survivor function has also been carried out.

The two sets of data for the studies were collected from St. James's Hospital, Leeds. They involved ninety-eight colon-cancer and seventy gastrointestinal­

cancer patients, each supplying a blood sample from which Carcinoembryonic Antigen (CEA) and Gamma-Glutamyl Transpeptidase (GGT) concentra­

tions in f.1g/ml were measured immunochemically.

Their respective ages were also recorded. While the survival times of all the colon-cancer patients were observed, four of those with the gastrointestinal cancer were not.

Our objective was first to establish the appropriate­

ness of our choice of model in each case, and hence to extract those covariates that appear to have a sig­

nificant influence on survival times. Having checked the validity of the model, estimates for the coeffi­

cients of the covariates and their respective standard errors were calculated. As an additional measure for assessing the appropriateness of the fitted model, each set of results was compared with the corre­

sponding results obtained from a Weibull (parametric)

86

The Arahian Journal for Science and Engineering. Volllme 19.

model fitted using the RSS GLIM 3.7 program.

All the graphical work was carried out using the SIMPLEPLOT graphical routines (available to the University of Bradford Computing service). Other forms of analysis were carried out with a computer program developed by us, which involves the Davidon - Fletcher- Powell optimization method [4].

2. PARAMETRIC ANALYSIS USING A WEIBULL MODEL

The survivor function for the standard Wei bull model can be written in the form:

(1) In order to take account of the three covariates which we denote by Z 1, Z2, and Z3 we fitted a survivor function of the form

(2) where

'Y(z)

=

'Y exp(a.z)

=

'Y exp(u1Z 1+U 2Z2 +U 3Z3 ), to the data.

This type of model was applied to the colon cancer data and the gastrointestinal cancer data in turn, and in both cases the models fitted the data well. The results concerning the estimates for the regressor coefficients are shown in Table 1 and indicate that GGT is the significant covariate for colon cancer, whereas CEA is the significant covariate for gastrointestinal cancer.

3. THE PROPORTIONAL HAZARDS MODEL Suppose z is a row vector of n measured covariates and Jl is a column vector of n regressor coefficients, while T is the related failure time. The proportional hazards model (PHM) is based on the view that the data can be explained via a hazard function

~(t;z) = ~o(t) t\J(z;Jl), (3) where ~0 (t) is the base hazard function for a

"standard" individual. We have assumed the usual form

t\J ( z ; Jl) = exp (z . Jl). (4) A partial likelihood for Jl can be obtained by taking the product over the failure times, ti , of the

Numher 1. J(//l/wrv 1994

(3)

B. D. Bunday and V. A. Kfri

Table 1. Wei bull Model

Colon Cancer Gastrointestinal Cancer Regressor

Variable Coefficient Standard Coefficient Standard

Error Error

Constant (-y) 7.75 0.55 6.93 0.69

CEA 0.000013 0.000043 0.0087 0.0015

GGT 0.0026 0.00049 0.00029 0.00036

AGE -0.00073 0.0087 0.0030 0.011

Shape Parameter (k) 1.32

conditional probability that individual i fails at time

Ii' given that R(li) individuals are then at risk. When censoring or tied failure times occur, simple meas­

ures to handle the difficulties have been given by Kalbfleisch and Prentice [5] and Cox and Oakes [6].

To estimate the ~ parameters we minimize the negative of the resulting log-likelihood.

For most parametric models, problems of censoring, ties in failure times and time-dependent covariates can adversely affect their application. The propor­

tional hazards model has the advantage that it is possible to test for the effects of the covariates without specifying a precise (parametric) form for the base hazard ~o(I). In that sense, given the inho­

mogeneous nature of the data, and the need to inves­

tigate the effect of the covariates on survival, without having a full understanding of the failure process, the proportional hazards model appeared to offer a reasonable approach.

4. MODEL ADEQUACY

O'Quigley [7] took a fairly cynical view of the precise form chosen for W(Zi; (3). A number of techniques to verify this have been suggested by Cox and Snell [8] and Aitken and Clayton [9]. We have chosen the method illustrated in Kay [10], which involves the so-called residuals (e) and their corre­

sponding cumulative hazard function of residuals H(e), defined by the relationship:

ei In S(I;, Zi)'

The plot of the H(ei)'s (unadjusted for covariates) against the e/s is expected to exhibit a straight line relationship of unit slope, if the model fits the data.

In other words, the e/s should behave as a random sample of censored unit exponential variates. These residuals can be evaluated as follows:

ei = exp (z; (3) H(I, 0),

1.13

where H(I,O) is the cumulative base-hazard func­

tion.

To check the prior assumption that the independ­

ent covariates affect the hazard in a multiplicative way, we provide a plot of log -log (survival function) {log ( -log

(S

(t,

z)}

against I, or log I, in which

z

is the mean covariate-vector and

S

(I, z) is the estimate of the survivor function at time I. This process entails the specification of strata based upon those variables suspected of possessing effects that violate this property (in turn). We therefore need to evaluate the respective survivor estimates at each time in each stratum and check the plot to see whether the differences between strata exhibit constancy. This obviously involves the estimation of both the regres­

sion parameters ((3) and the base-hazard function.

The problem of estimation of the base-hazard function {~0 (I)} has been considered by several authors - Cox [11], Breslow [12], Kalbfleisch [13], and Oakes [14].

5. APPLICATION OF CHI-SQUARE TECHNIQUES

The use of the PHM in studies involving several covariates requires extensive exploratory analysis.

The aim is to check for dependence of failure time on each covariate, when considered in turn.

One such step is to use a log-rank test as discussed by Kalbfleisch and Prentice [5]. For covariates with continuous values, this may involve stratifying such a covariate to obtain an S-sample problem (S ~ 2), as illustrated for example by Bunday and Kiri [15]. The test avoids parameter estimation.

A more robust approach is to apply the likelihood ratio test, either in the backward step-down proce­

dure of Greenberg el af. [16] or the forward step-wise procedure described in Kay [10]. This would result

Jalluary /99./ The Arahiull JOllrnal for Science and Engilleering. Volume 19, Nllmher I. 87

(4)

in a model which contains only those covariates that make significant contributions to the log-likelihood.

We have applied both the log-rank test and then as a second step the likelihood ratio test in our selection procedure.

6. RESULTS FROM ANALYSIS

For our log-rank tests we designed a two-sample problem for each covariate, in which we standardized such covariates and assigned the values 0 or 1, depending on whether it lies to the left or right of the mean for that covariate. The value of the log-rank statistic will then have a value from the X ~1) distribu­

tion and large values will support the view that the covariate influences survival time. The results of the test are shown in Table 2.

The results of the likelihood ratio tests (using the backward step-down procedure) are shown in Table 3. We can see from these that, while GGT is the only factor which passes all these tests for the colon data, that role is taken by CEA for the gastrointestinal

Table 2. Log-Rank Test [xZI) Variable]

Regressor Colon Gastrointestinal Variable Cancer Cancer

CEA 14.25 23.39

GGT 35.97 3.33

AGE 0.0005 0.18

data. The actual parameter estimates, along with their standard errors and test statistics, are shown in Tables 4 and 5 for the two data sets. These estimates confirm that GGT is the important variable for colon cancer and CEA is the important variable for gastro­

intestinal cancer. Only these variables were consid­

ered in the final models. The asymptotic chi-square statistics given in Tables 4 and 5 were calculated for each jth covariate as:

square of {~/(the estimated standard error of ~j}' To check the proportionality of the hazard-function condition, we dichotomized each covariate in turn using the standardized covariates. Plots of Table 3. Eliminating Variables Having Insignificant Effect on the Survival of

Cancer Patients Using the Backward Step-Down Procedure.

Colon Cancer Gastrointestinal Cancer Regressor

Variable(s) Maximum Likelihood Maximum Likelihood

used log -likelihood ratio test log -likelihood ratio test

statistic statistic

CEA, GGT,AGE -342.66 -208.61

CEA, GGT -342.66 0.00* -208.65 0.04*

CEA, AGE -350.60 7.94 -208.91 0.25

GGT, AGE -342.73 0.07 -218.39 9.73

CEA -350.67 8.01 -208.94 0.28*

GGT -342.73 0.07* -218.54 9.61

AGE -354.63 11.90 -218.74 9.80

NONE -352.67 11.93 -218.86 9.92

* Stages in the elimination process.

Table 4. Parameter Estimates, Standard Errors, and Chi-Square Statistics for the Initial Model

(Colon Cancer).

Regressor Standard Chi-Square

Coefficient

Variable Error Statistic

CEA 0.00001628 0.000043768 0.1384136 GGT 0.00258861 0.000534792 23.4295184 AGE 0.00027310 0.009097781 0.0009011

Table 5. Parameter Estimates, Standard Errors, and Chi-Square Statistics for the Initial Model

(Gastrointestinal Cancer).

Regressor Standard Chi-Square

Coefficient

Variable Error Statistic

CEA 0.00962674 0.001887196 26.0210116 GGT 0.00029878 0.000358532 0.6944598 AGE 0.00316980 0.010961430 0.0836235

88

The Arabian Journal for Science and Engineering, Volume 19, Number 1. January 1994

(5)

B. D. Bunday and V. A. Kiri

log{ -log(S(t, z)} against logt were then drawn for each stratum, having fitted the model with the other covariates. Figures 1 and 2 show plots for the CEA covariate and the constancy of separation of the two curves is reasonably apparent. In fact only the age covariate provided evidence to suggest violation of the proportionality assumption in each study.

Fortunately it is not a significant variable in either case and, as such, the violation does not require rejection of the PHM.

Figures 3 and 4 show the residual plot mentioned in Section 4. The straight line plot at an angle of 45°

is apparent and confirms the adequacy of our final model. Table 6 gives the estimates of the one signif­

icant coefficient (along with its standard error) in our final models for the two data sets.

Thus it appears that GGT is important in its effect on the hazard function for colon cancer patients with CEA playing the same role for gastrointestinal

Table 6. Final Version of the PHM Model.

Colon Cancer Gastrointestinal Cancer Regressor

Variable Coefficient Standard Coefficient Standard

Error Error

GGT 0.0027 0.0004

CEA 0.0097 0.0019

Final Form for the Weibull Model.

Colon Cancer Gastrointestinal Cancer Regressor

Variable Coefficient Standard Coefficient Standard

Error Error

GGT 0.0027 0.0004

CEA 0.0089 0.0016

::::::6

~ .~

::J

~4 0) 0

:t

.3

0 ) 2 0

COLON

Z(I) > 0:

I I I I I

,

I I

, ,

I

I

,. .. ' "

• •

I

4

83 ~

Z(I) < 0 .~

~2

g;

:t1 0 )

.9

0 -1 -2

-3

8 12 16

Log(Survival-time) )

GASTROINTESTINE

Z{I) < 0 Z(I) > 0:

,

I

,

"

• ,

I

,

I

,

I I

,

I

"

4 8 12 16

Log(Survival-time) ) Figure 1. Checking the Inclusion of CEA. z(i) are the Figure 2. Checking the Inclusion of CEA. z(i) are the

standardized cov-values. Colon. standardized cov-values. Gastrointestine.

Jan/lary 19<)4 The Arabian Journal for Science and Engineering, Volume 19, Number 1.

-2

89

(6)

cancer. In both cases an increasing level increases the hazard and so decreases the chance of survival.

The remarks concerning CEA are certainly in agree­

ment with other and more detailed studies (National Institute of Health Consensus Development Council [17]).

7. DISCUSSION

When each set of data was analyzed assuming a Weibull hazard function, we obtained results very similar to those obtained using the proportional hazards model. Of course with constant explanatory variables this is what we would expect.

Model (2) gives

S1 (t) = exp [ - (-y (z)t) k]

=

exp [- (exp(a . z)')'t )k]

exp [ - (')'t) k] exp(a . z)k

and so has the form of a proportional hazards model with

COLON

1 2 3 4

Residuals

Figure 3. Checking Jor Goodness-oj-Fit oj the Final Model.

Colon.

Indeed, as has been pointed out by Bunday [18]

and many others, the Wei bull model can be viewed as a proportional hazards model or as an accelerated life model. However, the advantage of the propor­

tional hazards model is that the analysis can proceed without a detailed specification of the form of the underlying hazard. This is virtually impossible in the case of such a complex process as failure due to cancer and so it was felt that, in general, parametric models will find limited application, particularly if the data are inhomogeneous and censored. At the same time the proportional hazards model provides a simple interpretation of the idea that the effect of a covariate is to multiply the hazard by a constant factor. Thus if interest centres on the qualitative effect on failure time of the explanatory variables (as it does here), the choice of the proportional hazards model seems to be indicated.

Of course the proportional hazards model also allows us to obtain numerical estimates of the base survivor function and these are illustrated in Figures 5 and 6.

ACKNOWLEDGEMENT

We wish to thank Professor E. H. Cooper of the Unit of Cancer Research, University of Leeds, for the provision of the data used in our analysis.

GASTROINTESnNE

2 3

Residuals Figure 4. Checking Jor Goodness-oj-Fit oj the Final Model.

Gastrointestine.

The Arabian Journal for Science and Engineering, Volume 19, Number 1. January 1994

90

4

(7)

COLON

§ 1.0

·u

c:

:::J

~0.8

>

.~

:::J

en

0.6

200 600 1000

Survival time Figure 5. Survival-Plot for the Final Model. Colon.

REFERENCES

[1] J. De Mello, L. Struthers, R. Turner, E. H. Cooper, G. R. Giles, and the Yorkshire Regional Gastro­

intestinal Cancer Research Group, "Multivariate Analysis as Aids to Diagnosis and Assessment of Prognosis in Gastrointestinal Cancer" , British 1.

Cancer, 48 (1983), p. 341.

[2] R. J. Marshall and E. M. Chrisholm, "Hypothesis Testing in the Polychotomous Logistic Model with an Application to Detecting Gastrointestinal Cancer", Statistics in Medicine, 4 (1985), p. 337.

[3] T. Muller, R. J. Marshall, E. H. Cooper, D. A.

Watson, D. Walker, and A. Mearns, "The Role of Serum Tumour Markers to Aid the Selection of Lung Cancer Patients for Surgery and the Assessment of Prognosis", Eur. 1. Cancer Clin. Oncol., 21(12) (1985), p. 1461.

[4] B. D. Bunday and V. A. Kiri, "Maximum Likelihood Estimation - Practical Merits of Variable Metric Optimisation Methods", The Statistician, 36 (1987), p.349.

[5] J. D. Kalbfleisch and R. L. Prentice, The Statistical Analysis of Failure Time Data. New York: Wiley,

1980.

[6] D. R. Cox and D. Oakes, Analysis of Survival Data.

Londpn: Chapman and Hall, 1984.

[7] J. O;Quigley, "Regression Models and Survival Predi~tion", The Statistician, 31(1) (1982), p. 107.

[8] D. R. Cox and E. J. Snell, "A General Definition of Residuals", 1. Roy. Stat. Soc., B, 30 (1968), p. 248.

B. D. Bunday and V. A. Kiri

GASTROINTESTINE

§ 1.0

·u

c:

:::J

-; 0.8

>

.~

:::J

(/) 0.6

200 600

Figure 6. Survival-Plot for the Final Model. Gastrointestine.

[9] M. Aitken and D. Clayton, "The Fitting of Exponen­

tial, Weibull, and Extreme-Value Distributions to Complex Censored Survival Data Using GLIM", Applied Statistics, 29 (1980), p. 156.

[10] R. Kay, "Proportional Hazards Regression Models and the Analysis of Censored Survival Data", Applied Statistics, 26(3) (1977), p. 227.

[11] D. R. Cox, "Regression Models and Life Tables", 1. Roy. Stat. Soc., 34 (1972), p. 187.

[12] N. E. Breslow, "Covariance Analysis of Censored Survival Data", Biometrics, 30 (1974), p. 89.

[13] J. D. Kalbfleisch, "Some Efficiency Calculations for Survival Distributions", Biometrika, 61 (1974), p. 31.

[14] D. Oakes, "The Asymptotic Information in Censored Survival Data", Biometrika, 64 (1977), p.441.

[15] B. D. Bunday and V. A. Kiri, "Analysis of Censored Recidivism Data Using a Proportional Hazards-Type Model", The Statistician, 41 (1992), p. 85.

[16] R. A. Greenberg, S. Bayard, and D. Byar, "Selecting Concomitant Variables Using a Likelihood Ratio Step-Down Procedure and a Method of Testing Goodness of Fit in an Exponential Survival Model", Biometrics, 30 (1974), p. 601.

[17] National Institute of Health Development Conference,

"Carcinoembryonic Antigens: Its Role as a Marker in the Management of Cancer", Annals of Internal Medicine, 94 (1981), p. 407.

[18] B. D. Bunday, Statistical Methods in Reliability Theory and Practice. London: Ellis Horwood, 1991.

Paper Received 10 June 1990; Revised 18 November 1992.

January 1994 The Arabian Journal for Science and Engineering, Volume 19, Number I. 91

Referensi

Dokumen terkait

Based on the results of the SWOT analysis, a new Business Canvas Model can be developed which is then continued with the Blue Ocean Strategic using the

The result of this study is an online learning model using a website that can present learning material from lecturers in the form of pages that all the menus can be read directly