Document Type : Research articles


1 Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran

2 Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran


Background: Tumor stage is one of the most reliable prognostic factors in the clinical characterization of colorectal cancer. The identification of genes associated with tumor staging may facilitate the personalized molecular diagnosis and treatment along with better risk stratification in colorectal cancer.
Objectives: The study aimed to identify genetic signatures associated with tumor staging and patients’ survival in colorectal cancer and recognize the patients’ risk category for clinical outcomes based on transcriptomic data.
Methods: In this retrospective cohort study, two available transcriptomic datasets, including 232 patients with colorectal cancer under accession number GSE17537 and GSE17536 were used as discovery and validation sets, respectively. A Bayesian sparse group selection method in the discovery set was applied to identify the associated genes with the tumor staging. Then further screen- ing was performed using survival analysis, and significant genes were used to develop a gene signature model. Finally, the robust performance of the signature model was assessed in the validation set.
Results: A total of 56 genes were significantly associated with the tumor staging in colorectal cancer. Survival analysis resulted in a shortlist of 19 genes, including ADH1B (P = 0.012), AHI (P = 0.006), AKAP12 (P = 0.018), BNIP3 (P = 0.015), CLDN11 (P = 0.015), CST9L (P = 0.028), DPP10 (P = 0.029), FBXO33 (P = 0.036), HEBP (P = 0.025), INTS4 (P = 0.003), LIPJ (P = 0.001), MMP21 (P = 0.006), NGRN (P = 0.014), PAFAH1B2 (P = 0.035), PCOLCE2 (P = 0.009), PIM1 (P = 0.007), TBKBP1 (P = 0.003), TCEB3B (P = 0.001), and TIPARP (P = 0.018), developing the signature model and validation. In both discovery and validation sets, the discrimination ability of the signature model to categorize patients with colorectal cancer into low- and high-risk subgroups for mortality and recurrence at 3- and 5-years showed good discrimination performances, with the area under the receiver operating characteristic curve (ROC) ranging from 0.64 to 0.88. It also had good sensitivity (discovery set 63.1%, validation set 61.7%) and specificity (discovery set 75.0%, validation set 59.3%) to discriminate between early- and late-stage groups.
Conclusions: We identified a 19-gene signature associated with tumor staging and survival of colorectal cancer, which may repre- sent potential diagnosis and prognosis markers, and help to classify patients with colorectal cancer into low- or high-risk subgroups.