Identifying Survival Predictive Factors in Patients with Breast Cancer: A 16-Year Cohort Study Using Cox Maximum Penalized Likelihood Method


Rezvaneh Alvandi 1 , Aliakbar Rasekhi 1 , * , Mehdi Ariana 2

1 Department of Biostatistics, Tarbiat Modares University, Tehran, Iran

2 Cancer Research Center, Shahid Beheshti University of Medical Sciences, Breast Surgery Department, Tehran, Iran

How to Cite: Alvandi R, Rasekhi A, Ariana M. Identifying Survival Predictive Factors in Patients with Breast Cancer: A 16-Year Cohort Study Using Cox Maximum Penalized Likelihood Method, Iran Red Crescent Med J. 2019 ; 21(4):e85398. doi: 10.5812/ircmj.85398.


Iranian Red Crescent Medical Journal: 21 (4); e85398
Published Online: April 28, 2019
Article Type: Research Article
Received: October 15, 2018
Revised: March 9, 2019
Accepted: March 12, 2019




Background: Cancer is the second leading cause of death globally, and it was responsible for almost 9.6 million deaths in 2018. Breast cancer (BC) is the most common cancer among women with almost two million new cases worldwide in 2018. Thus, it is necessary to study new methods to estimate the survival predictive factors in BC patients.

Objectives: This cohort study aimed to fit a Cox model to BC data using partial likelihood (PL) and new maximum penalized likelihood (MPL) methods in order to determine the predictive factors of survival time and compare the accuracy of these two methods.

Methods: This prospective cohort study used the data of 356 women with BC registered at the Cancer Research Center of Shahid Beheshti University of Medical Sciences in Tehran, Iran. The patients were identified from 1999 to 2015. The Cox model by new MPL and PL methods was used with variables such as the stage of cancer, tumor grade, estrogen receptor, and several other variables for univariate and multiple analyses.

Results: The mean age ± standard deviation (SD) of patients at diagnosis was about 48 ± 11.27 years ranging from 24 to 84 years. Using the new MPL method, in addition to lymphovascular invasion and recurrence variables, estrogen receptor (P = 0.045) also had a statistically significant relationship with survival. The standard errors of most variables were smaller when using the MLP method than the PL method. The overall one-year, two-year, five-year, and 10-year survival rates based on the baseline hazard estimate were 96%, 92%, 70%, and 51%, respectively.

Conclusions: In the analysis of BC data, new MPL method can help identify the factors that affect the survival of patients more accurately than usual methods do. This method decreases the standard error of most variables and can be applied for identifying predictive factors more accurately than previous methods.


Breast Cohort Cox Model Estrogen Invasion Maximum Penalized Likelihood Neoplasm Receptors Recurrence Survival

Copyright © 2019, Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License ( which permits copy and redistribute the material just in noncommercial usages, provided the original work is properly cited

1. Background

As the second leading cause of death globally, cancer was responsible for almost 9.6 million deaths in 2018 (1). According to the reports of the World Health Organization (WHO), about 1 in 6 deaths worldwide is due to cancer and approximately 70% of cancer deaths occur in low- and middle-income countries (2). Breast cancer (BC), with almost two million new cases in 2018, is the most common cancer in women and the second common cancer after lung cancer in the world, leading to the death of 626,679 BC patients (1). However, the cure rate of BC will be high if it is quickly diagnosed and treated according to the best available therapies (3). In the Islamic Republic of Iran, BC constitutes more than 24% of all cancers (4) with an incidence rate of 24.8 to 34 per 100,000 women and a mortality rate of less than 10.2 per 100,000 women in 2018 (1). This cancer affects most Iranian women between the ages of 35 and 44 years (5, 6), which is more than 10 years lower than the age of BC onset among women in the Western countries (4). Based on the results of previous studies (7), BC in Iranian women is often diagnosed at later stages; hence, further treatment is barely responding, and the patients have lower survival rates.

Survival analysis investigates prognostic factors of survival in patients using methods such as the Kaplan-Meier method and the Cox proportional hazards model (8). The Cox model is the most common statistical model in analyzing survival data (9). The term “proportional hazards” means that the risk for a patient is proportional to the risk for another patient and it does not change over time (10). In this model, the regression coefficient is estimated by maximizing the partial likelihood (PL) function, and the baseline hazard is an unspecified positive function of time that determines the shape of the survival function. It is not often estimated, and the positivity constraint on it is not considered, leading to the loss of efficiency.

Due to its importance, researchers are still looking for an optimal estimate of the baseline hazard function. In recent years, a new method has been proposed by Ma et al. (11) to simultaneously estimate the baseline hazard function and regression coefficients, which is called the maximum penalized likelihood (MPL) method. Although the MPL methods have already been existed (12-14), they have deficiencies such as not ensuring the positivity constraint on the baseline hazard. In the method of Ma et al. (11), a new iterative optimization algorithm combines Newton’s method with a multiplicative iterative algorithm to estimate the baseline hazard function, thus respecting the positive constraint on it. Moreover, this algorithm provides accurate variance approximations for both regression coefficients and baseline hazard (11, 15, 16).

2. Objectives

Because of the unprecedented growth of BC, especially in developing countries, it is necessary to use new efficient methods for analyzing predictive factors of this disease. Therefore, this study was conducted to determine the survival rate and the risk of death in BC patients and to estimate the survival predictive factors using the new MPL method. Moreover, a comparison is made between this new method and the usual PL method.

3. Methods

3.1. Data Source

This prospective cohort study used the data of 2300 patients with BC registered at the Cancer Research Center of Shahid Beheshti University of Medical Sciences in Tehran, Iran. Established in 2006, this governmental, referral, and comprehensive cancer control center has countrywide admission. It has a linked clinical section, Comprehensive Cancer Control Center, which focuses on early detection, diagnosis, and treatment of different cancers.

The survival outcome of 234 patients was not available in the dataset. Thus, they were excluded from the study. Of 2066 remaining patients, 1710 patients were excluded because of incomplete medical records, male gender, and death due to a cause other than breast cancer. Therefore, the final analysis was done on 356 patients diagnosed with BC between 1999 and 2015. This sample size was sufficient based on Kleinbaum and Klein 10, considering a recurrence variable and using the Equation 1:

Equation 1.

Where α = 0.05 and β = 0.20 are probabilities of type I and type II errors, Δ = 1.6 is the effect size, and PEV1 = 0.75 and PEV0 = 0.10 are the probabilities of death in the recurrence and non-recurrence groups, respectively. Thus, the power was 0.80, and the sample size was obtained as 347. Two observers examined the medical records of patients with the kappa coefficient of 0.87. Figure 1 represents the flow diagram of data extraction.

The flow diagram of data extraction
Figure 1. The flow diagram of data extraction

3.2. Variables and Statistical Methods

In this study, the survival time of patients (months) and predictive variables were taken into account (Table 1). First, the Kaplan-Meier curves were drawn for all predictive variables to peruse the survival of patients at different levels of predictive variables. The log-rank test was used to conduct univariate analysis and make comparisons based on survival at different levels of predictive variables to find factors with a significant effect on patients’ survival time. By extracting significant factors in univariate analysis, multiple analysis was done using the Cox proportional hazards model. The Cox model is written as Equation 2

Equation 2.

where Xi is the vector of predictive variables for the ith patient, β is the vector of coefficients, and h0(t) is the baseline hazard that is an unknown and non-negative function of survival time (t) with no assumptions about its shape (17).

Table 1. Clinical, Pathological, and Biological Characteristics of Patients and Univariate Analysis
VariableFrequency (%)Mean Survival (Death Percentage)P Value
Marital status0.169
Single23 (6.5)31.08 (4.3)
Married333 (93.5)36.52 (19.5)
Yes121 (34)36.78 (19)
No235 (66)35.85 (18.3)
Tumor size, cm< 0.001a
< 2101 (28.4)41.61 (6.9)
2 - 5192 (53.9)32.81 (17.7)
> 563 (17.7)37.68 (39.7)
Lymphovascular invasion< 0.001a
Positive201 (56.5)42.49 (27.4)
Negative155 (43.5)31.29 (7.1)
Cancer stage< 0.001a
169 (19.4)43.41 (4.3)
2173 (48.6)35.21 (11)
3100 (28.1)34.23 (35)
414 (3.9)26.19 (64.3)
Tumor grade0.002a
150 (14)40.05 (8)
2179 (50.3)37.81 (16.8)
3127 (35.7)32.33 (25.2)
ER< 0.001a
Positive263 (73.9)35.86 (13.7)
Negative93 (26.1)37.03 (32.3)
Positive238 (66.9)37.40 (14.7)
Negative118 (33.1)33.69 (26.3)
Positive194 (54.5)31.58 (18.6)
Negative162 (45.5)41.66 (18.5)
Age at diagnosis, y0.371
< 4195 (26.7)37.48 (24.2)
41 - 4988 (24.7)32.91 (13.6)
49 - 5695 (26.7)41.13 (16.8)
> 5678 (21.9)32.21 (19.2)
Family history0.612
Yes107 (30.1)35.02 (15.9)
No249 (69.9)36.66 (19.7)
Recurrence< 0.001a
Yes85 (23.9)46.77 (74.1)
No271 (76.1)32.84 (1.1)

Abbreviations: ER, Estrogen receptor; PR, Progesterone receptor; HER2, Human Epidermal growth factor Receptor 2.

aSignificant at the 5% level.

Regression coefficients of the proportional hazard model are usually estimated by maximizing the Cox partial likelihood (PL) function (18) where the baseline hazard is not required when estimating the regression coefficients. MPL is another new method that considers the simultaneous estimation of baseline hazard and coefficients in the Cox model. In this method, optimization is achieved using a new iterative algorithm, which combines Newton’s method with the multiplicative iterative algorithm (19) and satisfies the non-negativity requirement on the baseline hazard estimate (11).

The Cox model depends on an assumption that if it is not valid, the analysis will silently give misleading results, in the sense that the size of the effects or even their direction may be inaccurate. For this reason, the validation of the assumption is done by the scaled Schoenfeld residuals (20).

The correlation between variables was also investigated in order to avoid multicollinearity. Multicollinearity is a condition in which some of the independent variables are highly correlated, causing large standard errors of estimated coefficients. In order to reduce the standard errors of the estimated coefficients, one proposal is to drop one or several independent variables from the model (21). The significance level was set at 0.05 and the data were analyzed by the survival MPL package of R version 3.4.3 software.

4. Results

Of the 356 studied patients, 66 (18.5%) died by the end of the study and 290 (81.5%) patients censored. The age of patients ranged from 24 to 84 years with a mean ± standard deviation (SD) of 48.42 ± 11.27 and a median of 48 years.

Figure 2 shows the Kaplan-Meier survival curve for the variable “stage of cancer” (due to the lack of space, the Kaplan-Meier curve is presented only for this variable). Based on the curve, patients with stage I had higher survival than patients in other stages. Table 1 shows the clinical, pathological, and biological characteristics of the patients. The last column of Table 1 presents the results of the log-rank test. According to the P values obtained from this test, there was a statistically significant difference between the levels of Kaplan-Meier survival curves in terms of variables including the stage of cancer, tumor size, tumor grade, lymphovascular invasion, cancer recurrence, estrogen receptor, and progesterone receptor. Therefore, only were these variables used in multiple analysis.

The Kaplan-Meier survival curve for cancer stage
Figure 2. The Kaplan-Meier survival curve for cancer stage

Multicollinearity between the variables and proportional hazard assumption were first checked. The highest correlation (0.78) was established between the stage of cancer and tumor size. Thus, the stage of cancer was not included in the multiple analysis. In assessing the assumption of proportional hazards, the P values obtained from the correlation check between Schoenfeld residuals and the ranked failure times for any of the variables were not significant, indicating that there was not enough evidence to reject the assumption of proportional hazards. Table 2 shows the results of the Cox model fit using the PL and MPL methods.

Table 2. Multivariate Cox Regression Analysis Using PL and MPL Methods
VariableHazard Ratio (PL)Hazard Ratio (MPL)Std. Error (PL)Std. Error (MPL)Adjusted P Value (PL)Adjusted P Value (MPL)
Tumor size, cm
< 2 (reference)
2 - 51.301.380.4360.4540.540.47
> 51.451.450.4630.4640.420.41
Lymphovascular invasion
Negative (reference)
Positive2.372.460.3510.3360.01a< 0.007a
Tumor grade
1 (reference)
Positive (reference)
Positive (reference)
No (reference)
Yes47.8733.210.7270.621< 0.001a< 0.001a

Abbreviations: ER, estrogen receptor; PR, progesterone receptor; MPL; maximum penalized likelihood; PL, partial likelihood.

aSignificant at the 5% level.

Based on the results of Table 2, for most variables, the standard errors were lower in the new MPL method than in the PL method, and the estimation of the parameters and the hazard ratio were different from those of the PL method. For example, the standard error and hazard ratio were 0.727 and 47.87, respectively, for the recurrence variable in the PL method while these values were 0.621 and 33.21 in the new MPL method.

In multiple analysis using the PL method, statistically significant relationships were found between survival and lymphovascular invasion (P = 0.01) and recurrence (P < 0.001). Based on the new MPL method, in addition to lymphovascular invasion (P = 0.007) and recurrence (P < 0.001) variables, estrogen receptor (P = 0.045) was also a significant variable. Based on these results, the risk of death was 1.72 times higher in patients with negative receptor variable than in patients with a positive receptor variable (HR = 1.72).

Figure 3 shows the plot of the overall survival for patients with BC with a 95% confidence interval based on the new MPL method and the estimated baseline hazard.

The plot of overall survival for patients with breast cancer
Figure 3. The plot of overall survival for patients with breast cancer

Table 3 shows the patients’ survival rates of 12, 24, 60, 84, and 120 months using the nonparametric Kaplan-Meier method and the MPL method by estimating baseline hazard function. According to Table 3, the new MPL method estimated the survival probability after 24, 60, 84, and 120 months with fewer standard errors compared to the nonparametric Kaplan-Meier method.

Table 3. Survival Estimate with MPL and Kaplan-Meier Methods
Survival Time in MonthsSurvival Estimate (MPL)Std. Error (MPL)Survival Estimate (Kaplan-Meier)Std. Error (Kaplan-Meier)

5. Discussion

Cancer is the second leading cause of death globally and it was responsible for almost 9.6 million deaths in 2018 (1). BC is a malignant tumor mostly observed in women. BC affects Iranian women at least one decade sooner than women in developed countries (22). Reports show that the average age in BC occurrence is 61 years among US white females (23) while our study, like other studies from Iran (24, 25), showed that the mean age at BC diagnosis is about 48 years. To estimate survival in patients with breast cancer, researchers mainly employ the Cox proportional hazards model. This model contains a non-negative component, baseline hazard, which is not estimated or is poorly treated. In the present study, we aimed to determine survival prognostic factors in BC women using the new MPL method in the Cox model and compared it with the usual PL method to introduce the better method in identifying factors that really affect the survival of BC patients. MPL is a new method in which the regression coefficients and baseline hazard are estimated simultaneously using the new Newton-MI algorithm. This algorithm ensures that the estimated hazard function is non-negative. Based on our results, the standard deviations of most variables given by the new MPL method were less than those obtained by the usual PL method. These results are in line with the results of Ma et al. (11), Xu et al. (16) and Arena et al. (26). The penalty term in the MPL method helps estimate the baseline hazard more accurately and correct the bias of estimates (11); therefore, the final results and the hazard ratios will be more accurate.

In our study, in addition to lymphovascular invasion and recurrence variables, the estrogen receptor variable also played a significant role. The estrogen receptor is a predictor of survival in women with BC (27, 28) as confirmed by the new MPL method in our study. This is while this variable was not significant in the analysis using the usual PL method. Arena et al. used the MPL method to obtain more reliable variance parameter estimates (26).

Based on our findings, the new MPL method showed an association between lymphovascular invasion and survival as the risk of death was 2.46 times higher in patients with positive lymphovascular invasion than in patients with negative lymphovascular invasion (HR = 2.46). This outcome was in line with the report of Akbari et al. (27). One of the indices used to evaluate the quality of care is the estimation of the five-year survival rate. The five-year survival rate of BC patients in developed countries, regardless of the stage of cancer, is 73% (29). In this study, the MPL method calculated the overall five-year survival rate of 70% with a lower standard error than the PL method; this result is consistent with the results of Movahedi et al. (30) with a 71% survival rate.

In this study, the new MPL method calculated slightly smaller standard errors for most variables than the usual PL method and the estrogen receptor variable was a significant predictor of survival, as well. Although using the new MPL method does not guarantee lower standard errors for all estimates, since it estimates both baseline hazard function and coefficients simultaneously, the standard errors of most variables decrease. A simulation study by Xu et al. demonstrated that the MPL method works well and usually offers smaller standard errors and bias (15). Therefore, it can be applied to identify predictive factors more accurately than previous methods do.




  • 1.

    Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394-424. doi: 10.3322/caac.21492. [PubMed: 30207593].

  • 2.

    Ferlay J EM, Lam F, Colombet M, Mery L, Pi-Eros M, Znaor A, et al. Global cancer observatory: Cancer today. Lyon, France: International Agency for Research on Cancer; 2018.

  • 3.

    Hajian S, Vakilian K, Najabadi KM, Hosseini J, Mirzaei HR. Effects of education based on the health belief model on screening behavior in high risk women for breast cancer, Tehran, Iran. Asian Pac J Cancer Prev. 2011;12(1):49-54. [PubMed: 21517230].

  • 4.

    Radmard AR. Five common cancers in Iran. Arch Iran Med. 2010;13(2):143-6.

  • 5.

    Heidari Z, Mahmoudzadeh-Sagheb HR, Sakhavar N. Breast cancer screening knowledge and practice among women in southeast of Iran. Acta Med Iran. 2008;46(4):321-8.

  • 6.

    Taleghani F, Yekta ZP, Nasrabadi AN. Coping with breast cancer in newly diagnosed Iranian women. J Adv Nurs. 2006;54(3):265-72. discussion 272-3. doi: 10.1111/j.1365-2648.2006.03808_1.x. [PubMed: 16629910].

  • 7.

    Alizadeh Otaghvar H, Hosseini M, Tizmaghz A, Shabestanipour G, Noori H. A review on metastatic breast cancer in Iran. Asian Pac J Trop Biomed. 2015;5(6):429-33. doi: 10.1016/j.apjtb.2015.02.001.

  • 8.

    Moran JL, Bersten AD, Solomon PJ, Edibam C, Hunt T; Australian, et al. Modelling survival in acute severe illness: Cox versus accelerated failure time models. J Eval Clin Pract. 2008;14(1):83-93. doi: 10.1111/j.1365-2753.2007.00806.x. [PubMed: 18211649].

  • 9.

    Cox DR. Regression models and life-tables. J R Stat Soc Series B Stat Methodol. 1972;34(2):187-202. doi: 10.1111/j.2517-6161.1972.tb00899.x.

  • 10.

    Kleinbaum DG, Klein M. Survival analysis: A self-learning text. Springer Science & Business Media; 2006.

  • 11.

    Ma J, Heritier S, Lô SN. On the maximum penalized likelihood approach for proportional hazard models with right censored survival data. Comput Stat Data Anal. 2014;74:142-56. doi: 10.1016/j.csda.2014.01.005.

  • 12.

    Gray RJ. Spline-based tests in survival analysis. Biometrics. 1994;50(3):640-52. [PubMed: 7981391].

  • 13.

    Joly P, Commenges D, Letenneur L. A penalized likelihood approach for arbitrarily censored and truncated data: Application to age-specific incidence of dementia. Biometrics. 1998;54(1):185-94. doi: 10.2307/2534006. [PubMed: 9574965].

  • 14.

    Cai T, Betensky RA. Hazard regression for interval-censored data with penalized spline. Biometrics. 2003;59(3):570-9. [PubMed: 14601758].

  • 15.

    Xu J, Ma J, Prvan T. Non parametric hazard estimation with dependent censoring using penalized likelihood and an assumed copula. Commun Stat Theory Methods. 2016;46(22):11383-403. doi: 10.1080/03610926.2016.1267757.

  • 16.

    Xu J, Ma J, Connors MH, Brodaty H. Proportional hazard model estimation under dependent censoring using copulas and penalized likelihood. Stat Med. 2018;37(14):2238-51. doi: 10.1002/sim.7651. [PubMed: 29579781].

  • 17.

    Kleinbaum DG, Klein M. Survival analysis: A self-learning text. Third ed. New York: Springer; 2011.

  • 18.

    Cox DR. Partial likelihood. Biometrika. 1975;62(2):269-76. doi: 10.1093/biomet/62.2.269.

  • 19.

    Ma J. Positively constrained multiplicative iterative algorithm for maximum penalized likelihood tomographic reconstruction. IEEE Trans Nucl Sci. 2010;57(1):181-92. doi: 10.1109/tns.2009.2034462.

  • 20.

    Schoenfeld D. Chi-squared goodness-of-fit tests for the proportional hazards regression model. Biometrika. 1980;67(1):145-53. doi: 10.1093/biomet/67.1.145.

  • 21.

    Neter JW. Applied linear statistical models: Regression, analysis of variance, and experimental designs. Richard D. Irwin; 1974.

  • 22.

    Khadivi R, Harrirchi I, Akbari ME. Ten year breast cancer screening and follow up in 52200 women in Shahre-Kord, Iran (1997-2006). Iran J Cancer Prev. 2012;1(2):73-7.

  • 23.

    Anderson WF, Pfeiffer RM, Dores GM, Sherman ME. Comparison of age distribution patterns for different histopathologic types of breast carcinoma. Cancer Epidemiol Biomarkers Prev. 2006;15(10):1899-905. doi: 10.1158/1055-9965.EPI-06-0191. [PubMed: 17035397].

  • 24.

    Baghestani AR, Shahmirzalou P, Zayeri F, Akbari ME, Hadizadeh M. Prognostic factors for survival in patients with breast cancer referred to Cancer Research Center in Iran. Asian Pac J Cancer Prev. 2015;16(12):5081-4. [PubMed: 26163645].

  • 25.

    Baghestani AR, Zayeri F, Akbari ME, Shojaee L, Khadembashi N, Shahmirzalou P. Fitting cure rate model to breast cancer data of cancer research center. Asian Pac J Cancer Prev. 2015;16(17):7923-7. [PubMed: 26625822].

  • 26.

    Arena JE, Weigand SD, Whitwell JL, Hassan A, Eggers SD, Hoglinger GU, et al. Progressive supranuclear palsy: Progression and survival. J Neurol. 2016;263(2):380-9. doi: 10.1007/s00415-015-7990-2. [PubMed: 26705121].

  • 27.

    Akbari ME, Mirzaei HR, Soori H. [5 year survival of breast cancer in Shohada-e-Tajrish and Jorjani hospitals]. Hakim Res J. 2006;9(2):39-44. Persian.

  • 28.

    Khodabakhshi R, Reza Gohari M, Moghadamifard Z, Foadzi H, Vahabi N. [Disease-free survival of breast cancer patients and identification of related factors]. Razi J Med Sci. 2011;18(89). Persian.

  • 29.

    Gencturk N. The status of knowledge and practice of early diagnosis methods for breast cancer by women healthcare professionals. J Breast Health. 2013;9(1):5-9.

  • 30.

    Movahedi M, Haghighat S, Khayamzadeh M, Moradi A, Ghanbari-Motlagh A, Mirzaei H, et al. Survival rate of breast cancer based on geographical variation in Iran, a national study. Iran Red Crescent Med J. 2012;14(12):798-804. doi: 10.5812/ircmj.3631. [PubMed: 23483369]. [PubMed Central: PMC3587870].