Survival analysis of cervical cancer using stratified Cox regression
S. W. Purnami, K. D. Inayati, N. W. Wulan Sari, V. Chosuvivatwong, and H. Sriplung
Citation: AIP Conference Proceedings 1723, 030018 (2016); doi: 10.1063/1.4945076 View online: http://dx.doi.org/10.1063/1.4945076
View Table of Contents: http://aip.scitation.org/toc/apc/1723/1 Published by the American Institute of Physics
Articles you may be interested in
Cervical cancer survival prediction using hybrid of SMOTE, CART and smooth support vector machine AIP Conference Proceedings 1723, 030017 (2016); 10.1063/1.4945075
Survival Analysis of Cervical Cancer using Stratified Cox Regression
S.W. Purnami 1, a), K.D. Inayati1,b), N.W. Wulan Sari1), V. Chosuvivatwong, and H. Sriplung2)
1Department of Statistics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
2Faculty of Medicine, Prince Songkla University
Abstract. Cervical cancer is one of the mostly widely cancer cause of the women death in the world including Indonesia.
Most cervical cancer patients come to the hospital already in an advanced stadium. As a result, the treatment of cervical cancer becomes more difficult and even can increase the death’s risk. One of parameter that can be used to assess successfully of treatment is the probability of survival. This study raises the issue of cervical cancer survival patients at Dr. Soetomo Hospital using stratified Cox regression based on six factors such as age, stadium, treatment initiation, companion disease, complication, and anemia. Stratified Cox model is used because there is one independent variable that does not satisfy the proportional hazards assumption that is stadium. The results of the stratified Cox model show that the complication variable is significant factor which influent survival probability of cervical cancer patient. The obtained hazard ratio is 7.35. It means that cervical cancer patient who has complication is at risk of dying 7.35 times greater than patient who did not has complication. While the adjusted survival curves showed that stadium IV had the lowest probability of survival.
INTRODUCTION
Cervical cancer is one of the most common cancer suffered by women and is ranked as the top cause of death for women worldwide, with more than 270,000 women die each year [1]. In Indonesia, cervical cancer still first ranks among gynecological tumors [2]. Every day as many as 20-25 women die of cervical cancer [3].
The high number of cervical cancer are caused due to lack of awareness of women to take precautions, either by finding pre-cancerous and cancerous lesions at an early stadium (screening) [4]. Facts show that the rate of screening in Indonesia is only about 5%, whereas the figure is effective in reducing the incidence and mortality from cervical cancer is 85%.
Dr. Soetomo Hospital as the main reference area of eastern Indonesia, every day is always about ten patients diagnosed with cervical cancer are new cases and most patients who come already in the advanced stadium [5]. If cervical cancer is found at an advanced stadium, treatment becomes more difficult, expensive, and treatment outcome is not satisfactory, and even tends to hasten death [6]. One of the parameters that can be used to assess the success of the treatment of cervical cancer is the probability of survival of patients that can be measured for one year [7]. The probability of survival for patients one year (one-year survival rate) cervical cancer patients is 87% [8].
Based on previous research that has evolved obtained several factors that can affect patient survival cervical cancer, such as age [7] [9], the stadium [10] as well as anemia and completeness of treatment is also a factor affecting the survival of cervical cancer patients [11].
Survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of interest is time until an event occurs [12]. One of method that is often used in survival analysis is the Cox proportional hazards models [12]. This method requires all predictor variables must meet the assumption of proportional hazards (PH) that is independent of time. In fact, it is often found to be the case that not all predictor variables meet the PH assumption, as in this study did not meet the stadium where variable PH assumption.
One method that can be used to overcome the variables that do not meet the assumptions of the model Cox PH is stratification [12]. The “stratified Cox model” is a modification of the Cox proportional hazards (PH) model that
assumed to satisfy the PH assumption are included in the model, whereas the predictor being stratified is not included. Survival analysis with stratified Cox models widely applied in the fields of health, such as research on lung cancer survival [13]. In the case of the same disease, stratified Cox models are used to determine the effect CIMAvax vaccine against prognosis of patients [14]. Then latitude [15] studied the incidence of type I disease recurrent stroke with Stratified Cox models. While research on cervical cancer is still often used Cox proportional hazard models, such as research Sirait et al. [11] about the survival of cervical cancer patients at the Dr. Cipto Mangunkusumo Hospital and in the same case, Putri [9] conducted research at Dr. Soetomo Hospital.
Based on the above problems, this study used a Cox stratification model to analyze survival of cervical cancer patients.
KAPLAN MEIER SURVIVAL CURVES AND THE LOG RANK TEST
To estimate the survival probability at a given time, can be carried out using the Kaplan-Meier (KM) method (Kleinbaum & Klein, 2005). The probability of Kaplan Meier is the general equation as follows
1 r
T > t |T tj j j j
S t S t uP t
(1)
1 1
1
T > t |T ti
j j r
i
S t P i
t(2) The most popular testing method to evaluate whether or not KM curves for two or more groups are statistically equivalent is called the log–rank test (Kleinbaum & Klein, 2005). The hypothesis of the log rank test for two or more groups are as follows:
H0 : there is no difference in the survival curves in different groups H1 : at least one difference in the survival curves of different groups Test statistics
2
2 1
G i i
i i
O E
F |
¦
E (3) where,1 G
i i ij ij
j
O E
¦
m e and G Gi 1 i 1
ij
ij ij
ij
e = n m
n
§ ·
¨ ¸§ ·
¨ ¸¨ ¸
¨ ¸© ¹
¨ ¸
¨ ¸
© ¹
¦ ¦
m
ij = Number of subjects who failed in the i-th group at a timet
jn
ij = Number of subjects who are at risk of failing immediately to the i-th group before timet
je
ij = The expected value in the i-th group at a timet
jG
= Number of group Reject H0 ifχ
2> χ
2α,G-1.STRATIFIED COX REGRESSION
Stratified Cox model is a modification of the proportional hazard Cox model to cope with the independent variables that do not meet the assumption of proportional hazard. There are two types: the use of a “no-interaction”
version of the stratified Cox model and an alternative approach that allows interaction.
Interaction Test in Stratified Cox Model
To test whether there is interaction in stratified Cox model used likelihood ratio test:
2 ( 1)
2ln R ( 2ln F) p k
LR L L
F F
2p((1 (4) R = models without interactionF = models with interaction
p = number of categories in the stratification variables k* = number of categories in the stratification variables reject H0 if LR!
F
2p k(1) or p value D .Stratified Cox Model without Interaction
Stratified Cox model without interaction is a common form of stratified Cox models are formulated as follows.
, 0 exp
g g 1 1 2 2 p p
h t X h t βx +β x +…+β x (5) where,
g = 1,2,…,k* g
h t
0 = baseline hazard functionHazard ratio in the stratified Cox model without interaction is hazard for individual categories divided by hazard for different individuals, as the following equation:
1
1
1
(x )
0
0
( , ) ( ) e
e ( , )
( ) e
p i i p
i i i
i
i p
i i i
x x
x
h t h t
HR
h t h t
E E
E
¦ ¦
¦
X X
(6)
where,
( x ,x ,…,x
*1 *2 *p)
x
is set of independent variables for individual categories.( x ,x ,…,x
1 2 p)
x
is set of independent variables for individual with different categories.Stratified Cox Model with Interaction
Stratified Cox model with interaction between Zvariable and x variable in model is shown as follows:
, 0 exp
g g 1g 1 2g 2 pg p
h t X h t β x +β x +…+β x (7) where,
g = 1,2,…,k* g
h t
0 = baseline hazard function.METHODOLOGY
The data used in this research is secondary data obtained from the medical records of cervical cancer patients were hospitalized in the Dr. Soetomo Hospital. The collecting data is done about 5 month. Survival time and the factors that affect survival of cervical cancer patients were recorded (Table 1). The obtained data were 817 with 777 surviving patients and 40 patients died.
The dependent variable is survival time (T) of cervical cancer patients. Survival time (T) is the time from cervical cancer diagnosis until death. The study was done at January - December 2014. Meanwhile, the independent variable (X) is a factor that is thought to affect survival of cervical cancer.
Table 1. Research Variables
Research Variables Variable Name Description Measurement Scale
Response Variable
T Survival Time Time to event, it is started from the diagnose of
cervical cancer until death (in month) Ratio Y Patient Status 1 : not censored (death)
2 : censored Nominal
Predictor Variables
X1 Age - Ratio
X2 Cancer Stadium
0: Stadium 0
1: Stadium I (IA and IB) 2: Stadium II (IIA and IIB) 3: Stadium III (IIIA and IIIB) 4: Stadium IV (IVA and IVB)
Ordinal
X3 Treatment initiation
1: Chemotheraphy 2: Transfussion of PRC 3: Surgical Operation
4: Transfussion of PRC & Chemotheraphy
Nominal
X6 Companion disease 1: No
2: Yes Nominal
X5 Complication 1: No
2: Yes Nominal
X2 Anemia 1 : No
2 : Yes Nominal
The detail analysis of the research is presented in the flowchart in Figure 1 as follows.
Figure 1. Step analysis of research Data of cervical cancer
Kaplan Meier survival curve
log rank test
Test of PH
Assumptions Cox proportional
hazard yes
no
Identify the variables that do not meet the PH assumption
Stratified Cox model
Created adjusted survival curves
RESULTS
In this research, we collect data just a year (2014) since there is time constraint. The collected data are 746 patients for a year. There are 710 patients who survived until study end and 36 patients who died. Ideally, research of cervical cancer survival is conducted over 5 years. In this section, firstly we analyze descriptive statistic of cervical cancer patients. We want to know the survival time’s characteristic of cervical cancer patients which described by Kaplan Meier survival curve (Figure 2). It shows that, the survival curve tend decrease since about day 150.
Figure 2. Kaplan Meier Survival Curve of Cervical Cancer Patients
The next step is modeling survival time of cervical cancer. We begin with testing proportional hazard assumption for the independent variables using goodness of fit test. The results of the proportional hazards assumption can be presented in Table 2.
Table 2. Goodness of Fit Test Results
Variables Correlation P(PH)
Age 0,03029 0,8528
Stadium 0,84874 <0,0001
Treatment initiation 0,11633 0,4747
Companion disease 0,28011 0,0800
Complication -0,33171 0,0365
Anemia 0,16622 0,5376
Table 2 shows, with α = 0.01, all p-value is greater than 0.01 except stadium variable. Hence, most variables such age, treatment initiation, companion disease, complication and anemia are failed reject H0, it means that these variables meet the assumption of proportional hazards. While, the cancer stadium variable is rejected H0,. This indicates that the cancer stadium variable does not meet the assumption of proportional hazards. In other words, the cancer stadium variable is dependent on time. It means, with increasing time, the stadium patients also increased.
Based on the goodness of fit test results can be seen that there is one variable that does not meet the assumption of proportional hazards. So this research carries out a stratified Cox model for the analysis.
The following table show the parameter estimation of stratified Cox model for cervical cancer patients’ survival time based on stadium as the stratified variable and five independent variables that satisfy proportional hazard assumption.
Table 3. Simultaneous and Partial Test Result
Variables Estimates P.value
Age 0,01543 0,4724
Treatment (2) 1,47138 0,0159
Treatment (3) 1,26417 0,0403
Treatment (4) 2,22623 0,0012
Companion disease -0,13944 0,7160
0. 00 0. 25 0. 50 0. 75 1. 00
T
0 50 100 150 200 250 300 350
Legend: Pr oduct - Li mi t Est i mat e Cur ve Censor ed Obser vat i ons
Based on the results of parameter estimation at Table 3, it is obtained stratified Cox models as follows:
= exp(0,01543 age +1, 47138 type of treatment (2)
g 0g
h t h t
+1, 26417 type of treatment (3) + 2, 22623 type of treatment (4) -0,13944 comorbidities + 2,05915 complication-1,22921 anemia where g = 1,2,3,4, and 5.
After getting stratification models, the next step is to perform simultaneous testing to determine the suitability model and the partial test to determine the significance of the variables. Based on Table 3, it can be seen that the p- value for the likelihood ratio test statistic is smaller than α, so the decision is reject H0. It can be concluded that there are at least one different variable significant or influential in the Cox stratification model. It can be seen in Table 3, that the partial test p-value for all variables except the complication variable has a greater value of α = 0.01, thus failing to reject H0. While, the p-value of complication variable is <0.0001. This value is smaller than α (0.01), so reject H0. In other words, the complication significant affects the survival of cervical cancer.
Table 4. Hazard Ratio of Stratified Cox Model
Variables Hazard Ratio
Age 1,016
Treatment initiation (2) 4,355 Treatment initiation (3) 3,540 Treatment initiation (4) 9,265
Companion disease 0,870
Complication 7,350
Anemia 0,293
Based on Table 4, it can be seen that the hazard ratio of complication variable is 7.350. This value means that the cervical cancer patient who has complication at risk of dying 7.350 greater than patients who has not complications.
This is in accordance with the medical review [6] that patients with complications such as bleeding, kidney failure, etc will accelerate bad condition of patients. It can be seen also that, Treatment initiation (4)has highest hazard ratio with other treatments. This may be explained because patients have received treatment initiation (4) (combination surgery and the PRC) are patients who have advanced stadium so rate risk of death is greater than the other treatments. Furthermore, Figure 3 presents adjusted survival curves based on stratification that is stadium variable.
Figure 3. Adjusted survival curve based on stratified variables
Based on Figure 3, it can be seen that the survival curves for stadium II survival curves above the other, it be followed by survival curves for stadium III, stadium 0 and stadium IV. From these explanations can be interpreted that that the highest survival probability of cervical cancer patients is stadium II, while stadium IV has lowest survival probability.
St adi um 0 2 3 4
Sur vi vor Funct i on Est i mat e
0. 50 0. 55 0. 60 0. 65 0. 70 0. 75 0. 80 0. 85 0. 90 0. 95 1. 00
T
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160
CONCLUSION AND DISCUSSION
Kaplan-Meier survival curves and Log Rank test showed that there are differences in the survival curves at variable stage, type of treatment, and complications. While the results of Stratified Cox regression, we concluded that the factors affecting survival of one year (one-year survival rate) is a complication of cervical cancer patients. And based on the adjusted survival curves are known that patients have stage IV has a lowest survival probability. For future research can be recommended to study for five years after diagnosis in order to get better results and recommendations.
ACKNOWLEDGMENTS
The research was supported by International Collaboration Research and Publication under DIKTI grant.
REFERENCES
[1] WHO, Human Papillomavirus (HPV) and Cervical Cancer, s.l.: World Health Organization (WHO). 2013.
[2] Prawiroharjo,Sarwono. Ilmu Kandungan. Jakarta: EGC. (2010).
[3] Yayasan Kanker Indonesia, Press Release Training of Trainers Pap Tes dan IVA Serviks. [Online] Available at: https://www.facebook.com/kankerindonesia/posts/506094629486926(2013)
[4] Dwipoyono. Kebijakan Pengendalian Penyakit Kanker Serviks di Indonesia. Indonesian Journal of Cancer, 3(3), Juli-September. (2009).
[5] Jawa Pos National Network. (2014). Sehari 10 Pasien Baru Kanker Serviks. [Online] Available at:
http://www.jpnn.com/read/2014/10/11/262921/Sehari,-10-Pasien-Baru-Kanker-Serviks-
[6] S. Dalimartha, Deteksi Dini Kanker dan Simplisia Anti Kanker. Jakarta: Penebar Swadaya. (2004)
[7] D. Gayatri, Hubungan stadium dengan ketahanan hidup 5 tahun pasien kanker serviks di RSUPN Cipto Mangunkusumo dan RSK Dharmais. Jakarta, Depok :FKMUI(2002)
[8] American Cancer Society. “Cancer Facts and Figures”. Atlanta : American Cancer Society. (2014).
[9] R.M, Putri, Pemodelan Regresi Cox Terhadap Faktor Yang Mempengaruhi Ketahanan Hidup Penderita Kanker Serviks: Jurusan Statistika-ITS. (2008).
[10] R. Wijayanti , Perbandingan Analisis Regresi Cox dan Analisis Survival Bayesian Pada Ketahanan Hidup Kanker Serviks di RSUD Dr.Soetomo Surabaya: Jurusan Statistika-ITS. (2014).
[11] A.M. Sirait, I. Ariawan, and F. Aziz. Ketahanan Hidup Penderita Kanker Serviks di Rumah Sakit Cipto Mangun Kusumo Jakarta. Majalah Obstet Ginekol, 21(3), 183-190. (1997).
[12] D.G. Kleinbaum & M. Klein, Survival Analysis, A Self-Learning Text. New York: Springer. (2005).
[13] S. Ata, & M. Tekin, Cox Regression Model with Nonproportional Hazard Applied to Lung Cancer Survival Data. Journal of Mathematics and Statistics, (2), 157 – 167, (2007)
[14] C.V. Gonzales, J.F. Dupuy, & M.F. Lopez, Stratified Cox Regression Analysis of Survival under CIMAvax Vaccine. Journal of Cancer Therapy, (4), 8-14. (2013).
[15] M. A. Lintang, Penerapan Regresi Stratified Cox dengan Metode Conditional 1 Pada Data Kejadian Berulang Tidak Identik: Jurusan Matematika-Universitas Brawijaya. , (2013)