Machine Learning Model-based Detection of Potential Genetic Markers Associated with the Diagnosis of Small-cell Lung Cancer


Classification, Machine learning, Potential biomarkers, Small-cell lung cancer


How to Cite

Sarihan, M. E., Kucukakcali, Z., & Tekedereli, I. . (2023). Machine Learning Model-based Detection of Potential Genetic Markers Associated with the Diagnosis of Small-cell Lung Cancer. Iranian Red Crescent Medical Journal, 25(8).


Background: Small-cell lung cancer (SCLC), which is in the category of intractable cancers, has a low survival rate. It is essential to understand the pathophysiological pathways underlying its development to create powerful treatment alternatives for the disease.

Objectives: This study aimed to classify gene expression data from SCLC and normal lung tissue and identify the key genes responsible for SCLC.

Methods: This study used microarray expression data obtained from SCLC tissue and normal lung tissue (adjacent tissue) from 18 patients. An Extreme Gradient Boosting (XGBoost) model was established for the classification by five-fold cross-validation. Accuracy (AC), balanced accuracy (BAC), sensitivity (Sens), specificity (Spec), positive predictive value (PPV), negative predictive value (NPV), and F1 scores were utilized for performance assessment.

Results: AC, BAC, Sens, Spec, PPV, NPV, and F1 scores from the XGBoost model were 90%, 90%, 80%, 100%, 100%, 83.3%, and 88.9%, respectively. Based on variable importance values from the XGBoost, the HIST1H1E, C12orf56, DSTNP2, ADAMDEC1, and HMGB2 genes can be considered potential biomarkers for SCLC.

Conclusion: A machine learning-based prediction method discovered genes that potentially serve as biomarkers for SCLC. After clinical confirmation of the acquired genes in the following medical study, their therapeutic use can be established in clinical practice.


Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394-424. doi: 10.3322/caac.21492. [PubMed: 30207593].

Schabath MB, Cote ML. Cancer progress and priorities: lung cancer. Cancer Epidemiol Biomarkers Prev. 2019;28(10):1563-79. doi: 10.1158/1055-9965.EPI-19-0221. [PubMed: 31575553].

Wahbah M, Boroumand N, Castro C, El-Zeky F, Eltorky M. Changing trends in the distribution of the histologic types of lung cancer: a review of 4,439 cases. Ann Diagn Pathol. 2007;11(2):89-96. doi: 10.1016/j.anndiagpath.2006.04.006. [PubMed: 17349566].

Rami-Porta R, Bolejack V, Giroux DJ, Chansky K, Crowley J, Asamura H, et al. The IASLC lung cancer staging project: the new database to inform the eighth edition of the TNM classification of lung cancer. J Thorac Oncol. 2014;9(11):1618-24. doi: 10.1097/JTO.0000000000000334. [PubMed: 25436796].

Tsoukalas N, Aravantinou-Fatorou E, Baxevanos P, Tolia M, Tsapakidis K, Galanopoulos M, et al. Advanced small cell lung cancer (SCLC): new challenges and new expectations. Ann Transl Med. 2018;6(8):145. doi: 10.21037/atm.2018.03.31. [PubMed: 29862234].

Rudin CM, Brambilla E, Faivre-Finn C, Sage J. Small-cell lung cancer. Nat Rev Dis Primers. 2021;7(1):3. doi: 10.1038/s41572-020-00235-0. [PubMed: 33446664].

Pleasance ED, Stephens PJ, O'Meara S, McBride DJ, Meynert A, Jones D, et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature. 2010;463(7278):184-90. doi: 10.1038/nature08629. [PubMed: 20016488].

Karachaliou N, Pilotto S, Lazzari C, Bria E, de Marinis F, Rosell R. Cellular and molecular biology of small cell lung cancer: an overview. Transl Lung Cancer Res. 2016;5(1):2-15. doi: 10.3978/j.issn.2218-6751.2016.01.02. [PubMed: 26958489].

Shtivelman E, Hensing T, Simon GR, Dennis PA, Otterson GA, Bueno R, et al. Molecular pathways and therapeutic targets in lung cancer. Oncotarget. 2014;5(6):1392-433. doi: 10.18632/oncotarget.1891. [PubMed: 24722523].

Byers LA, Rudin CM. Small cell lung cancer: where do we go from here? Cancer. 2015;121(5):664-72. doi: 10.1002/cncr.29098. [PubMed: 25336398].

Kalemkerian GP. Advances in pharmacotherapy of small cell lung cancer. Expert Opin Pharmacother. 2014;15(16):2385-96. doi: 10.1517/14656566.2014.957180. [PubMed: 25255939].

Drapkin BJ, Rudin CM. Advances in small-cell lung cancer (SCLC) translational research. Cold Spring Harb Perspect Med. 2021;11(4):a038240. doi: 10.1101/cshperspect.a038240. [PubMed: 32513672].

Polikar R. Ensemble learning. Ensemble machine learning: Springer; 2012. p. 1-34.

Akman M, Genç Y, Ankarali H. Random Forests Yöntemi ve Saglik Alaninda Bir Uygulama/Random forests methods and an application in health science. Turk Klin Biyoistatistik. 2011;3(1):36-48.

Cai L, Liu H, Huang F, Fujimoto J, Girard L, Chen J, et al. Cell-autonomous immune gene expression is repressed in pulmonary neuroendocrine cells and small cell lung cancer. Commun Biol. 2021;4(1):314. doi: 10.1038/s42003-021-01842-7. [PubMed: 33750914].

Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. [PubMed: 17720704].

Fodor IK. A survey of dimension reduction techniques. Lawrence Livermore National; 2002.

Fonti V. Research Paper in Business Analytics: Feature Selection with LASSO. Amsterdam: VU Amsterdam; 2017.

Wang J, Li P, Ran R, Che Y, Zhou Y. A short-term photovoltaic power prediction model based on the gradient boost decision tree. Applied Sci. 2018;8(5):689. doi: 10.3390/app8050689.

Dikker J. Boosted tree learning for balanced item recommendation in online retail. Eindhoven University of Technology; 2017.

Salam Patrous Z. Evaluating XGBoost for user classification by using behavioral features extracted from smartphone sensors. KTH Royal Institute of Technology; 2018.

Smyth GK. Limma: linear models for microarray data. Bioinformatics and computational biology solutions using R and Bioconductor: Springer; 2005. p. 397-420.

Yan H, Zheng G, Qu J, Liu Y, Huang X, Zhang E, et al. Identification of key candidate genes and pathways in multiple myeloma by integrated bioinformatics analysis. J Cell Physiol. 2019;234(12):23785-97. doi: 10.1002/jcp.28947. [PubMed: 31215027].

Nong J, Gong Y, Guan Y, Yi X, Yi Y, Chang L, et al. Circulating tumor DNA analysis depicts subclonal architecture and genomic evolution of small cell lung cancer. Nat Commun. 2018;9(1):1-8. doi: 10.1038/s41467-018-05327-w

Murray N, Coy P, Pater JL, Hodson I, Arnold A, Zee B, et al. Importance of timing for thoracic irradiation in the combined modality treatment of limited-stage small-cell lung cancer. The national cancer institute of canada clinical trials group. J Clin Oncol. 1993;11(2):336-44. doi: 10.1200/JCO.1993.11.2.336. [PubMed: 8381164].

Johnson BE, Grayson J, Makuch RW, Linnoila RI, Anderson MJ, Cohen MH, et al. Ten-year survival of patients with small-cell lung cancer treated with combination chemotherapy with or without irradiation. J Clin Oncol. 1990;8(3):396-401. doi: 10.1200/JCO.1990.8.3.396. [PubMed: 2155310].

Lassen U, Osterlind K, Hansen M, Dombernowsky P, Bergman B, Hansen HH. Long-term survival in small-cell lung cancer: posttreatment characteristics in patients surviving 5 to 18+ years--an analysis of 1,714 consecutive patients. J Clin Oncol. 1995;13(5):1215-20. doi: 10.1200/JCO.1995.13.5.1215. [PubMed: 7738624].

CGAR. Comprehensive molecular profiling of lung adenocarcinoma. Nature. 2014;511(7511):543. doi: 10.1038/nature13385. [PubMed: 25079552].

CGA. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012;487(7407):330-7. doi: 10.1038/nature11252. [PubMed: 22810696].

George J, Lim JS, Jang SJ, Cun Y, Ozretić L, Kong G, et al. Comprehensive genomic profiles of small cell lung cancer. Nature. 2015;524(7563):47-53. doi: 10.1038/nature14664.

Rudin CM, Durinck S, Stawiski EW, Poirier JT, Modrusan Z, Shames DS, et al. Comprehensive genomic analysis identifies SOX2 as a frequently amplified gene in small-cell lung cancer. Nat Genet. 2012;44(10):1111-6. doi: 10.1038/ng.2405. [PubMed: 22941189].

Tatton-Brown K, Loveday C, Yost S, Clarke M, Ramsay E, Zachariou A, et al. Mutations in epigenetic regulation genes are a major cause of overgrowth with intellectual disability. Am J Hum Genet. 2017;100(5):725-36. doi: 10.1016/j.ajhg.2017.03.010. [PubMed: 28475857].

Lee LR, Teng PN, Nguyen H, Hood BL, Kavandi L, Wang G, et al. Progesterone enhances calcitriol antitumor activity by upregulating vitamin D receptor expression and promoting apoptosis in endometrial cancer cells. Cancer Prev Res (Phila). 2013;6(7):731-43. doi: 10.1158/1940-6207.CAPR-12-0493. [PubMed: 23682076].

Chang S, Yim S, Park H. The cancer driver genes IDH1/2, JARID1C/ KDM5C, and UTX/ KDM6A: crosstalk between histone demethylation and hypoxic reprogramming in cancer metabolism. Exp Mol Med. 2019;51(6):1-17. doi: 10.1038/s12276-019-0230-6. [PubMed: 31221981].

Kumar RD, Searleman AC, Swamidass SJ, Griffith OL, Bose R. Statistically identifying tumor suppressors and oncogenes from pan-cancer genome-sequencing data. Bioinformatics. 2015;31(22):3561-8. doi: 10.1093/bioinformatics/btv430. [PubMed: 26209800].

Dai J, Reyimu A, Sun A, Duoji Z, Zhou W, Liang S, et al. Establishment of prognostic risk model and drug sensitivity based on prognostic related genes of esophageal cancer. Sci Rep. 2022;12(1):8008. doi: 10.1038/s41598-022-11760-1. [PubMed: 35568702].

Zhang F, Chen X, Wei K, Liu D, Xu X, Zhang X, et al. Identification of key transcription factors associated with lung squamous cell carcinoma. Med Sci Monit. 2017;23:172-206. doi: 10.12659/msm.898297. [PubMed: 28081052].

Yang X, Zhu S, Li L, Zhang L, Xian S, Wang Y, et al. Identification of differentially expressed genes and signaling pathways in ovarian cancer by integrated bioinformatics analysis. Onco Targets Ther. 2018;11:1457-74. doi: 10.2147/OTT.S152238. [PubMed: 29588600].

Zhu W, Shi L, Gong Y, Zhuo L, Wang S, Chen S, et al. Upregulation of ADAMDEC1 correlates with tumor progression and predicts poor prognosis in non‐small cell lung cancer (NSCLC) via the PI3K/AKT pathway. Thorac Cancer. 2022;13(7):1027-39. doi: 10.1111/1759-7714.14354. [PubMed: 35178875].

Lou N, Zhu T, Qin D, Tian J, Liu J. High-mobility group box 2 reflects exacerbated disease characteristics and poor prognosis in non-small cell lung cancer patients. Ir J Med Sci. 2022;191(1):155-62. doi: 10.1007/s11845-021-02549-8. [PubMed: 33635447].