Cross-Sectional Study of Gene Expression Analysis Identifies Critical Biological Pathways and Key Genes Implicated in Non-Small Cell Lung Cancer


Tonglian Wang 1 , Jing Hu 2 , Lutong Xu 1 , Hongbo Zhao 3 , Yuanyue Li 1 , Tao Shou 2 , Xueshan Xia 1 , Qiang Chen 1 , *

1 Research Center of Molecular Medicine of Yunnan Province, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, P.R. China

2 Medical Oncology, The First People’s Hospital of Yunnan Province, Kunming, P.R. China

3 Institute of Molecular and Clinical Medicine, Kunming Medical University, Kunming, P.R. China

How to Cite: Wang T, Hu J, Xu L, Zhao H, Li Y, et al. Cross-Sectional Study of Gene Expression Analysis Identifies Critical Biological Pathways and Key Genes Implicated in Non-Small Cell Lung Cancer, Iran Red Crescent Med J. 2018 ; 20(3):e65035. doi: 10.5812/ircmj.65035.


Iranian Red Crescent Medical Journal: 20 (3); e65035
Published Online: March 31, 2018
Article Type: Research Article
Received: December 18, 2017
Revised: February 4, 2018
Accepted: February 24, 2018




Background: Non-small cell lung cancer (NSCLC) is the most common type of lung Neoplasms, which accounts for about 85% of all lung cancer types. However, critical biological pathways and key genes implicated in NSCLC remain ambiguous.

Objectives: The present study aimed at identifying the critical biological pathways and key genes implicated in NSCLC, and providing insight into the molecular mechanism underlying NSCLC.

Methods: In this case-control bioinformatics study, the researchers used four microarray data of NSCLC from public gene expression omnibus (GEO) database at the national center for biotechnology information (NCBI) website. The microarray data came from studies of American, Spanish, and Taiwanese NSCLC patients, and in total contained 190 NSCLC tissue and 180 normal lung tissue. A standardized- microarray preprocessing and gene set enrichment analysis (GSEA) were used to analyze each microarray data and obtained significantly regulated pathways. Venn analysis was used to identify the common significantly regulated biological pathways. Protein and protein interaction (PPI) network analysis was used to identify the key genes within common significantly regulated pathways. The PPI information was retrieved from the STRING database, and Cytoscape software was used to construct and visualize the PPI network.

Results: Through integrating GSEA results of four microarray data, finally, the researchers identified 22 common up-regulated and 85 common down-regulated pathways. Many genes within 107 common significantly regulated pathways were significantly enriched within cell cycle pathway (P value of 2.58e-79) and focal adhesion pathway (P value of 2.44e-81). The PPI network showed that up-regulated CDK1 (P value = 1.33e-18 and logFC = 1.41) and down-regulated PIK3R1 (P value = 5.09e-22 and logFC = -1.13) genes shared the most abundant edges, and were associated with NSCLC.

Conclusions: This cross-sectional study showed increased concordance between gene expression profiling data. These identified pathways and genes provide some insight into the molecular mechanisms of NSCLC, and the genes may serve as candidate diagnostic and therapeutic targets of NSCLC.


Cancer Carcinogenesis Critical Pathways Gene Expression Profiling Lung, Neoplasms Cancer Profiling

Copyright © 2018, Iranian Red Crescent Medical Journal. This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License ( which permits copy and redistribute the material just in noncommercial usages, provided the original work is properly cited

1. Background

Lung cancer, as one of the most common malignancies, is the leading cause of cancer-related deaths all over the world (1). In the last decades, the incidence and mortality rates of lung cancer have been increasing rapidly, especially in regions where tobacco consumption is more common (2). Although many studies have shown that smoking-tobacco accelerated lung carcinogenesis, genetic factors still play a key role (3). In all lung cancer types, non-small cell lung cancer (NSCLC) is the most common type and accounts for 85% of all lung cancer types. However, despite extensive researches, the molecular mechanisms implicated in NSCLC are yet to be uncovered.

In the last decades, gene expression analysis-based microarray was widely used to study the NSCLC, and hundreds of differentially expressed genes (DEGs) were found by differentially expressed gene analysis (DEGA) (4-6). Furthermore, several key genes, including the well-known gene epidermal growth factor receptor (EGFR) and tumor protein p53 (TP53) were identified (5). However, for most DEGs identified, their roles in NSCLC were obscure and needed to be discussed further. However, it is difficult to interpret the role of individual genes (7). Performing gene set analysis for gene expression profiling data is a more powerful method to reveal biological mechanisms implicated in NSCLC than conventional single-gene analysis methods, especially in identifying genes with subtle contributions (7, 8). Among some frequently used gene set analysis methods, gene set enrichment analysis (GSEA) is the most well-known and widely-used approach (7, 8), through which the significant difference in expression of pre-defined gene set between two groups of data can be identified. The pre-defined gene set can be a set of genes in a gene ontology category, in a biological pathway, or can be user-defined (9). Recently, using GSEA method, some biological pathways, such as Ras signaling pathway and Wnt signaling pathway were identified to be significantly regulated in NSCLC (9, 10), and explained the biological mechanisms of NSCLC. However, these studies mainly aimed at lung squamous cell carcinoma (LUSC), which is one of the major subtypes of NSCLC, or immune gene sets in females with NSCLC (9, 10). The identified biological pathways represented a fraction of the pathways implicated in NSCLC, and the biological pathways implicated in NSCLC need to be identified systematically.

In this study, the researchers collected four gene expression profiling data about NSCLC studies from Taiwanese, American and Spanish patients, and applied a standardized microarray preprocessing and GSEA to each gene expression profiling data to identify significantly regulated pathways. Furthermore, the researchers performed Venn analysis to identify common significantly regulated pathways and constructed the PPI network between genes within common significantly regulated pathways to identify key genes. This cross-study improved the concordance between gene expression profiling data and highlighted the genes weakly connected with NSCLC, which would provide some insight into the biological pathways implicated in NSCLC.

2. Methods

2.1. Microarray Data Collection

In this case-control bioinformatics study, the researchers used four microarray data of NSCLS for reanalysis. The microarray data were searched and downloaded from public gene expression omnibus (GEO) database at the national center for biotechnology information (NCBI) website ( These data following the criteria were used in this study: (1) the data were genome-wide, (2) the data included NSCLC and control data, (3) the raw or normalized data were complete and available, (4) the data were generated using the same chip platform.

Finally, data sets with GEO accessions GSE7670, GSE10072, GSE18842, and GSE19804 were used in this study. Affymetrix microarray platform generated the microarray data. GSE7670 and GSE19804 data were from Taiwanese NSCLC studies, separately contained 52 and 120 pairwise samples. Furthermore, GSE10072 and GSE18842 data were from American and Spanish NSCLC studies, and 107 and 91 case-control samples, separately. The related information of microarray data listed in Table 1, such as author, sample source, GEO accession, chip platform, sample size, and sample type.

Table 1. Characteristics of Datasets Included in This Study
GEO AccessionGSE10072GSE18842GSE19804GSE7670
Sample SourceAmericaSpainTaiwanTaiwan
First AuthorLandi MSanchez-Palencia ALu TPSu LJ
Submitted Year2008200920102007
Chip PlatformU133A [GPL96]U133 Plus 2.0 [GPL570]U133 Plus 2.0 [GPL570]U133A [GPL96]
Probe Number22k55k55k22k
HistologyACAC (14, 30%) and SCC (32, 70%)AC (56, 93%) and SCCAC
StagesI - IV (I, 38%. II, 36%. III, 21%. IV, 5%)I - IV (I, 83%. II, 9%. III, 7%. IV, 2%)I-IV (I + II, 78%. III + IV, 22%)Unknown (early and late)
Experimental DesignCase-controlPaired except one tumorPairedPaired
Sample NumberCancer tissue58466026
Normal tissue49456026

Abbreviations: AC, adenocarcinoma, one major subtype of non-small cell lung cancer; NSCLC, non-small cell lung cancer, one of the most common types of lung cancer; SCC, squamous cell carcinomas, one major subtype of non-small cell lung cancer.

2.2. Microarray Data Preprocessing

To improve the efficiency of data reanalysis, all microarray data must be reprocessed. The researchers performed the data reprocessing using version 3.3.2 R language ( and software packages version 3.4 Bioconductor project ( All data were subjected to background-adjust and normalized. Robust multichip averaging (RMA) algorithm was used to calculate the log2 probe-set intensities (11). Any gene failing to map any KEGG pathway was removed in the next analysis. The interquartile range (IQR) was used to measure the data variability. The cut-off value was set according to the resultant distribution of IQR values of all genes, and the genes with IQR values under 0.5 were removed. If one gene targeted multiple probe sets, the probe set with the most substantial variability was retained to be used in the next analysis.

2.3. Statistical Analysis of DEGs

Statistical analysis of DEGs was performed using the version 3.32.7 of Limma software package in Bioconductor project. Limma package employed the Voom method, Liner modeling, and empirical Bayes moderation to assess DEGs, and could acquire more robust results even in less of microarrays. The cut-off criteria of DEGs were fulfilled according to the following conditions: (1) a false discovery rate (FDR) was not more than 5%, and (2) a linear fold change (FC) was not less than 2 or not more than 0.5.

2.4. Statistical Analysis of Significant Pathways

Statistical analysis of significant pathway was accomplished using the GSEA method. The version 2.40.0 category package was used to perform GSEA of the pathway in the Bioconductor project. The purpose of performing GSEA was to determine whether the members of a gene set S were randomly distributed throughout the entire reference gene list L or was just primarily found at the top or bottom. The most significant advantage of the GSEA method was the relative robustness to noise and outliers in the data. Gene sets including less than 10 genes were discarded. In each pathway, the t-statistic mean of the genes was computed. A permutation test of 1000 times was implemented, and the pathways with P value ≤ 0.05 were identified to significantly change (12).

2.5. Protein-Protein Interaction Network Construction

The interaction relationship between genes within common critical biological pathways was exhibited using the PPI network. The PPI information was predicted using the STRING database (, and the minimum required interaction score between gene and gene was set for 0.9. The PPI network was constructed and visualized using open-source version 3.5.1 Cytoscape software (

3. Results

3.1. Identification of Significant Pathways

The researchers used GSEA to reanalyze four datasets to identify significantly regulated pathways implicated in NSCLC. According to the permutation 0.05 P value, the researchers found 28 (GSE7670), 48 (GSE10072), 63 (GSE18842), and 51 (GSE19804) up-regulated pathways, and 112 (GSE7670), 112 (GSE10072), 115 (GSE18842), and 118 (GSE19804) down-regulated pathways, separately (Table 2). The overlapping analysis showed that 22 common up-regulated pathways and 85 common down-regulated pathways were identified (Figure 1).

Table 2. Reanalysis Results of Significantly Regulated Pathway Number
GEO AccessionNo. of Genes After PreprocessedNo. of Pathways Have Genes ≥ 10Up-Regulated PathwaysDown-Regulated Pathways
Common significantly regulated pathways2285
Significant pathways identified and overlapped. A and B respectively represented up-regulated and down-regulated pathways. “GSEXXXX” was GEO accession of microarray dataset. For each dataset, the researchers performed GSEA to generate P value for each pathway and used a permutation test with 1000 times, and obtained significant pathways with P values cut-off of ≤ 0.05. A, GSEA detected 28, 48, 63 and 51 up-regulated pathways and 22 common pathways were found; B, GSEA detected 112, 112, 115 and 118 down-regulated pathways and 85 common pathways were found.
Figure 1. Significant pathways identified and overlapped. A and B respectively represented up-regulated and down-regulated pathways. “GSEXXXX” was GEO accession of microarray dataset. For each dataset, the researchers performed GSEA to generate P value for each pathway and used a permutation test with 1000 times, and obtained significant pathways with P values cut-off of ≤ 0.05. A, GSEA detected 28, 48, 63 and 51 up-regulated pathways and 22 common pathways were found; B, GSEA detected 112, 112, 115 and 118 down-regulated pathways and 85 common pathways were found.

In common up-regulated pathways, the researchers observed that many pathways belonged to cell growth and death, carbohydrate metabolism, nucleotide metabolism, glycan biosynthesis and metabolism, replication and repair, translation and so on. In common down-regulated pathways, the researchers found that many pathways belonged to the immune system, cellular community, signal transduction, endocrine system, immune diseases, infectious diseases and so on (Table 3).

Table 3. Common Significant Pathways Identified of Four Datasets by Gene Set Enrichment Analysis (GSEA)a,b,c
EntryPathway NameClassNumber of Overlapping /Enriching GenesPercentage of Common Genes, %FDR
04110Cell cycleCell growth and death66/6556.402.58E-79
03013RNA transportTranslation54/5345.804.22E-52
04115p53 signaling pathwayCell growth and death38/3759.401.08E-45
00230Purine metabolismNucleotide metabolism52/4939.101.95E-44
00240Pyrimidine metabolismNucleotide metabolism34/3242.001.01E-29
00480Glutathione metabolismMetabolism of other amino acids25/2458.101.22E-27
00520Amino sugar and nucleotide sugar metabolismCarbohydrate metabolism21/2046.701.10E-21
03050ProteasomeFolding, sorting and degradation21/1953.808.41E-21
00051Fructose and mannose metabolismCarbohydrate metabolism18/1658.101.26E-18
00250Alanine, aspartate and glutamate metabolismAmino acid metabolism17/1665.406.75E-18
03030DNA replicationReplication and repair15/1544.102.62E-16
00510N-Glycan biosynthesisGlycan biosynthesis and metabolism16/1538.106.24E-14
03008Ribosome biogenesis in eukaryotesTranslation15/1524.604.64E-11
00512Mucin type O-Glycan biosynthesisGlycan biosynthesis and metabolism11/1147.805.14E-11
00030Pentose phosphate pathwayCarbohydrate metabolism11/1050.001.78E-10
03060Protein exportFolding, sorting and degradation9/945.009.48E-10
03410Base excision repairReplication and repair10/1038.502.34E-09
00983Drug metabolism - other enzymesXenobiotics biodegradation and metabolism12/1052.203.46E-08
03430Mismatch repairReplication and repair8/840.003.68E-08
00601Glycosphingolipid biosynthesis - lacto and neolacto seriesGlycan biosynthesis and metabolism9/840.901.06E-07
03020RNA polymeraseTranscription9/834.603.26E-07
00860Porphyrin and chlorophyll metabolismMetabolism of cofactors and vitamins6/626.100.000446
04510Focal adhesionCellular community - eukaryotes103/10257.502.44E-81
04010MAPK signaling pathwaySignal transduction100/9546.702.36E-61
04062Chemokine signaling pathwayImmune system84/7754.907.64E-55
04144EndocytosisTransport and catabolism84/7947.202.60E-54
04145PhagosomeTransport and catabolism76/6961.801.77E-53
04810Regulation of actin cytoskeletonCell motility82/7946.303.19E-51
04060Cytokine-cytokine receptor interactionSignaling molecules and interaction90/8749.503.19E-51
04380Osteoclast differentiationDevelopment64/6256.102.23E-49
04514Cell adhesion molecules (CAMs)Signaling molecules and interaction65/5959.103.00E-42
05146AmoebiasisInfectious diseases2/561.503.13E-41
05145ToxoplasmosisInfectious diseases60/5156.107.16E-38
04670Leukocyte transendothelial migrationImmune system55/5056.702.19E-36
05142Chagas disease (American trypanosomiasis)Infectious diseases46/4435.706.66E-33
04610Complement and coagulation cascadesImmune system38/3869.101.07E-32
05323Rheumatoid arthritisImmune diseases43/4056.601.52E-30
05150Staphylococcus aureus infectionInfectious diseases35/3283.306.28E-30
04630Jak-STAT signaling pathwaySignal transduction55/5152.406.28E-30
04540Gap junctionCellular community - eukaryotes40/3756.302.65E-27
04530Tight junctionCellular community - eukaryotes49/4447.107.24E-27
04640Hematopoietic cell lineageImmune system40/3758.007.24E-27
04270Vascular smooth muscle contractionCirculatory system46/4252.309.53E-27
04660T cell receptor signaling pathwayImmune system45/3946.401.57E-26
05140LeishmaniasisInfectious diseases36/3362.101.04E-25
04666Fc gamma R-mediated phagocytosisImmune system40/3547.104.16E-24
04650Natural killer cell mediated cytotoxicityImmune system46/4047.902.28E-23
04722Neurotrophin signaling pathwayNervous system44/3838.601.55E-22
04350TGF-beta signaling pathwaySignal transduction35/3147.301.67E-21
04910Insulin signaling pathwayEndocrine system0/3903.63E-21
04662B cell receptor signaling pathwayImmune system34/2947.908.35E-21
04916MelanogenesisEndocrine system35/3345.502.52E-20
04210ApoptosisCell growth and death34/3143.603.32E-20
05100Bacterial invasion of epithelial cellsInfectious diseases31/2948.405.25E-20
05120Epithelial cell signaling in Helicobacter pylori infectionInfectious diseases27/2745.802.70E-19
04972Pancreatic secretionDigestive system33/3149.303.32E-19
04020Calcium signaling pathwaySignal transduction46/4238.703.92E-19
05416Viral myocarditisCardiovascular diseases2/251.504.14E-19
03320PPAR signaling pathwayEndocrine system27/2754.009.82E-19
04150mTOR signaling pathwaySignal transduction23/2548.903.13E-18
05144MalariaInfectious diseases29/2269.001.48E-17
04970Salivary secretionDigestive system29/2848.302.15E-17
05221Acute myeloid leukemiaCancers25/2347.201.01E-16
05210Colorectal cancerCancers29/2348.302.53E-16
04920Adipocytokine signaling pathwayEndocrine system24/2447.101.32E-15
04940Type I diabetes mellitusEndocrine and metabolic diseases22/1968.802.92E-15
04960Aldosterone-regulated sodium reabsorptionExcretory system22/1868.804.92E-15
05020Prion diseasesNeurodegenerative diseases18/1866.704.92E-15
05213Endometrial cancerCancers23/2046.902.83E-14
04971Gastric acid secretionDigestive system24/2353.304.19E-14
04730Long-term depressionNervous system27/2150.906.33E-14
04912GnRH signaling pathwayEndocrine system30/2539.509.73E-14
05143African trypanosomiasisInfectious diseases16/1666.703.15E-13
04370VEGF signaling pathwaySignal transduction30/2046.901.03E-12
04664Fc epsilon RI signaling pathwayImmune system28/2141.801.08E-12
05332Graft-versus-host diseaseImmune diseases19/1663.301.64E-12
04976Bile secretionDigestive system22/2145.802.05E-12
05414Dilated cardiomyopathyCardiovascular diseases30/2244.806.07E-11
04962Vasopressin-regulated water reabsorptionExcretory system19/1652.806.23E-11
05330Allograft rejectionImmune diseases17/1465.401.51E-10
04964Proximal tubule bicarbonate reclamationExcretory system12/1175.001.62E-10
04720Long-term potentiationNervous system21/1838.903.39E-10
00982Drug metabolism - cytochrome P450Xenobiotics biodegradation and metabolism20/1846.503.39E-10
00071Fatty acid degradationLipid metabolism16/1547.104.92E-10
05412Arrhythmogenic right ventricular cardiomyopathy (ARVC)Cardiovascular diseases27/1945.806.03E-10
04930Type II diabetes mellitusEndocrine and metabolic diseases18/1550.009.89E-10
00380Tryptophan metabolismAmino acid metabolism17/1454.801.21E-09
04621NOD-like receptor signaling pathwayImmune system25/1649.003.26E-09
00590Arachidonic acid metabolismLipid metabolism17/1642.501.24E-08
05410Hypertrophic cardiomyopathy (HCM)Cardiovascular diseases26/1841.902.99E-08
05320Autoimmune thyroid diseaseImmune diseases17/1463.003.27E-08
00980Metabolism of xenobiotics by cytochrome P450Xenobiotics biodegradation and metabolism19/1645.205.21E-08
04672Intestinal immune network for IgA productionImmune system15/1346.906.73E-08
00564Glycerophospholipid metabolismLipid metabolism19/1831.101.58E-07
05310AsthmaImmune diseases12/1066.703.38E-07
04710Circadian rhythmEnvironmental adaptation9/1045.004.86E-07
04973Carbohydrate digestion and absorptionDigestive system13/1148.109.06E-07
02010ABC transportersMembrane transport14/1145.201.58E-06
04070Phosphatidylinositol signaling systemSignal transduction24/1537.502.55E-06
00340Histidine metabolismAmino acid metabolism11/845.802.67E-05
04260Cardiac muscle contractionCirculatory system14/1329.804.04E-05
04080Neuroactive ligand-receptor interactionSignaling molecules and interaction27/2624.100.000284
00562Inositol phosphate metabolismCarbohydrate metabolism20/943.500.00224
04623Cytosolic DNA-sensing pathwayImmune system17/850.000.00856
04320Dorso-ventral axis formationDevelopment10/450.000.0232
04130SNARE interactions in vesicular transportFolding, sorting and degradation1648.50Not be enriched
00830Retinol metabolismMetabolism of cofactors and vitamins626.10Not be enriched

Abbreviation: FDR, False discovery rate. FDR was obtained according to the results computed by STRING platform.

aNumber of overlapping genes was obtained according to the overlap of genes within each common pathway of four datasets.

bNumber of enriching genes was obtained according to the enriching results of all genes within all common pathways.

cTwo pathways with 04130 and 00830 entry were not enriched in functional enrichment of Protein and Protein Interaction (PPI) network, but were significantly regulated by GSEA.

3.2. Identification of Key Genes

Overall, 412 genes were found within 22 common up-regulated pathways. Based on the minimum required interaction score of 0.9 for PPI information from STRING database, 370 of 412 genes were enriched in PPI networks (P value < 1.0e-16), and these genes were significantly enriched within cell cycle pathway (P value = 2.58e-79) and p53 signal pathway (P value = 1.08e-45). Besides, some pathways related to metabolism were also enriched at the top. These pathways included metabolic pathways (P value = 3.03e-64), purine metabolism (P value = 1.95e-44), pyrimidine metabolism (P value = 1.01e-29) and so on (Table 3). The PPI network showed that cyclin-dependent kinase 1 (CDK1) gene shared the most abundant edges (Figure 2A), and expression of CDK1 gene was significantly up-regulated in NSCLC samples (P value = 1.33e-18, logFC = 1.41, from GSE10072 data). In addition, the TP53 gene was also observed to share more abundant edges.

Protein and protein interaction(PPI) network of genes within significantly regulated pathways. A and B represented PPI network of the genes within up-regulated pathways and down-regulated pathways, respectively. Each node represented one gene. The node with color showed the gene belonging to the pathway class with the same color. Node size represented degree size of the node. The label of the node represented gene name. PPI, protein and protein interaction; A, genes of PPI network were mainly enriched within some pathways belonging to cell cycle and death, metabolism and so on. CDK1 and TP53 genes shared more abundant edges; B, Genes of PPI network were mainly enriched within some pathways belonging to the immune system, signal transduction and so on. PIK3R1 and PIK3CA genes shared more abundant edges.
Figure 2. Protein and protein interaction(PPI) network of genes within significantly regulated pathways. A and B represented PPI network of the genes within up-regulated pathways and down-regulated pathways, respectively. Each node represented one gene. The node with color showed the gene belonging to the pathway class with the same color. Node size represented degree size of the node. The label of the node represented gene name. PPI, protein and protein interaction; A, genes of PPI network were mainly enriched within some pathways belonging to cell cycle and death, metabolism and so on. CDK1 and TP53 genes shared more abundant edges; B, Genes of PPI network were mainly enriched within some pathways belonging to the immune system, signal transduction and so on. PIK3R1 and PIK3CA genes shared more abundant edges.

Similarly, 1,972 genes were found within 85 common down-regulated pathways, and 905 genes were mainly enriched within focal adhesion pathway (P value = 2.44e-81), MAPK signaling pathway (P value = 2.36e-81), and chemokine signaling pathway (P value = 7.64e-55) (Table 3). The PPI network showed that phosphatidylinositol 3-kinase regulatory subunit alpha (PIK3R1) gene (P value = 5.09e-22, logFC = -1.13, from GSE10072 data) shared the most abundant edges, and was significantly down-regulated in NSCLC (Figure 2B). Besides, the researchers found that phosphatidylinositol 3-kinase 3 catalytic subunit alpha (PIK3CA) and EGFR genes also shared more abundant edges.

4. Discussion

Finally, NSCLC mainly including adenocarcinoma and squamous cell carcinoma is the most common type of lung cancer. However, early diagnosis and treatment of NSCLC are still difficult. One main reason is that the molecular mechanism implicated in NSCLC is vague. In this study, the researchers selected four microarray data of NSCLC to perform GSEA and PPI network analysis. Microarray data were from the same Affymetrix platform. The purpose was to minimize the error between chip platforms. In addition, these data included NSCLC patients from Asia, America, and Europe, and included two major subtypes of NSCLC and smoking status of NSCLC patients, which contributed to obtaining insight in the common molecular mechanism underlying NSCLC. Through GSEA and PPI network analysis, the results revealed that 107 pathways (22 up- and 85 down-regulated) were significantly dysregulated in NSCLC and the abnormal expression of CDK1 and PIK3R1 genes were associated with NSCLC.

Uncontrolled proliferation is one of the most prominent features of tumor cells. In the last decades, many studies have focused on the pathways related to cell growth and death in tumor formation. More and more results showed that cell cycle pathway and p53 signaling pathway played a key role in the formation of malignant tumors (9, 13-15). In this study, the researchers observed that cell cycle pathway (all P value < 0.001, from GSEA results of four independent microarray data) and p53 signaling pathway (all P value < 0.001, from GSEA results of four independent microarray data) were positively regulated in NSCLC. Furthermore, functional enrichment in PPI network showed that 65 and 37 genes were enriched within cell cycle pathway (P value = 2.58e-79, ranked first) and p53 signaling pathway (P value = 1.08e-45, ranked third), separately.

When the tumor was formed, sufficient energy, raw materials, and NADPH were required to provide for fast-growing cancer cells (16). Up-regulated pathways related to metabolism would provide the necessity to cancer cells. At present, many study results have shown that these pathways were related to cancers, such as prostate cancer and breast cancer and so on (16, 17). The current results showed that many metabolism pathways were significantly up-regulated in NSCLC, which mainly included purine metabolism (P value = 1.95e-44, ranked fourth), pyrimidine metabolism (P value = 1.01e-29, ranked fifth), glutathione metabolism (P value = 1.22e-27, ranked sixth), and amino sugar and nucleotide sugar metabolism (P value = 1.01e-21, ranked seventh).

Glycans, as important signaling molecules, attached to proteins or lipids and played an important role in malignant transformation (18). At present, glycans have been used as candidate diagnostic markers and therapeutic targets in clinics (19, 20). The modulation of N-Glycan biosynthesis (P value = 6.24e-14, ranked twelfth) changed the glycosylation of proteins and/or lipids, which made the functions and structures of glycoproteins and/or glycolipids change. The altered functions, such as cell signaling and cell adhesion, facilitated cancer invasion and metastasis (21).

An essential function of the immune system was immune surveillance, which played a key role in identifying and destroying tumors and defending against cancers (22). Once the immune system of the host was dysfunctional, tumors escaped the immune surveillance to transform cancers (23). Furthermore, tumor cells released some immunosuppressive cytokines, such as prostaglandins, vascular endothelial growth factor and transforming growth factor-beta to directly or indirectly inhibit the immune response (24). In the GSEA results, the researchers surprisingly found that 12 pathways related to the immune system were significantly down-regulated, which indicated that the immune system was strongly altered/inhibited in NSCLC (25).

In the pathways belonging to the cellular community, the researchers identified three pathways including tight junction pathway (P value = 7.24e-27, ranked nineteenth), gap junction pathway (P value = 2.65e-27, ranked eighteenth) and focal adhesion pathway (P value = 2.44e-81, ranked first), which were significantly down-regulated. Tight junctions played vital roles in creating an intercellular barrier, controlling para-cellular diffusion, and maintaining cell-cell junction and tissue integrity (26). The alterations in the expression or structures of tight junction proteins led to the loss of cohesion of tight junction structure, which resulted in the invasion and metastasis of cancer cells (26). Gap junction has been speculated to be essential in regular intercellular communication, and the loss of direct intercellular communication was found to be commonly associated with cancer onset and progression (27). A number of studies demonstrated that tumor promoters effectively inhibited the gap junctional between cells, while tumor suppressors effectively enhanced gap junction function (27-29). Focal adhesions, also called cell-matrix adhesions, similar to tight junction and gap junction, played crucial roles in mediating many processes, including migration and cell adhesion, tissue homeostasis, and tumorigenesis and so on (30). The loss or down-regulation of cell-cell and cell-matrix adhesion contributed to the invasion and metastasis of cancer cells (31). The current results showed that three pathways might play essential roles in cell migration of NSCLC.

At present, several studies have reported that some genes, such as EGFR, TP53, and PIK3CA, were associated with lung cancer (32-35). The current studies also showed that these genes shared more abundant edges in PPI networks (Figure 2), and further verified that the genes played an important role in NSCLC. However, the researchers observed that CDK1 and PIK3R1 genes shared the most abundant edges than the above genes in the sub-network, and were significantly up-regulated and down-regulated in NSCLC, separately. Currently, a few studies reported that the CDK1 gene were associated with carcinomas, including gastric, colorectal, breast, and lung cancers (36-39). Moreover, published results showed that some non-coding RNA inhibited cell proliferation of NSCLC by targeting CDK1 (39, 40). Despite these results, the role of CDK1 in NSCLC is still vague. The current results further proved the role of CDK1 in NSCLC, and CDK1 was the key gene in the PPI network. At present, PIK3R1 has been proved to be a double-sided factor in different cancers, and was a positive regulator in breast and endometrial cancers (41, 42) and was a negative regulator in renal cancer (43). Few studies focused on the role of PIK3R1 in lung cancer. The current results showed that PIK3R1 was significantly down-regulated in NSCLC.

The strong point of the current study was to integrate gene profiling data of NSCLC from different subtypes, different people, and different smoking status to explore the molecular mechanism implicated in NSCLC using gene set analysis and PPI network analysis. Gene set analysis is more potent in revealing biological mechanisms than single-gene analysis, especially in identifying genes with subtle contributions. The PPI network analysis may confirm the interaction between genes, and contribute to the discovery of key genes in the biological process. Two methods are helpful for the in-depth understanding of the molecular mechanism of NSCLC. The primary limitation of the current study was that pure bioinformatics methods obtained the results. Experiments did not confirm the results. Next, greater attention to the results and verifying the genes by experiments to deepen the understanding of molecular mechanism of NSCLC is required.

5. Conclusions

A cross-sectional study of gene expression profiling data identified many pathways and genes implicated in NSCLC. Up-regulated cell cycle pathway and down-regulated focal adhesion pathway were significantly associated with NSCLC. The study increased the concordance between gene expression profiling data and provided insight into the molecular mechanisms of NSCLC. The CDK1 and PIK3R1 genes were identified as key genes of NSCLC and may serve as candidate diagnostic and therapeutic targets of NSCLC.




  • 1.

    Edwards BK, Brown ML, Wingo PA, Howe HL, Ward E, Ries LA, et al. Annual report to the nation on the status of cancer, 1975-2002, featuring population-based trends in cancer treatment. J Natl Cancer Inst. 2005;97(19):1407-27. doi: 10.1093/jnci/dji289. [PubMed: 16204691].

  • 2.

    Zhang H, Cai B. The impact of tobacco on lung health in China. Respirology. 2003;8(1):17-21. [PubMed: 12856737].

  • 3.

    Hu Z, Wu C, Shi Y, Guo H, Zhao X, Yin Z, et al. A genome-wide association study identifies two new lung cancer susceptibility loci at 13q12.12 and 22q12.2 in Han Chinese. Nat Genetics. 2011;43(8):792-6. doi: 10.1038/ng.875.

  • 4.

    Petty RD. Gene Expression Profiling in Non-Small Cell Lung Cancer: From Molecular Mechanisms to Clinical Application. Clin Cancer Res. 2004;10(10):3237-48. doi: 10.1158/1078-0432.ccr-03-0503.

  • 5.

    Yu D, Li J, Han Y, Liu S, Xiao N, Li Y, et al. Gene expression profiles of ERCC1, TYMS, RRM1, TUBB3 and EGFR in tumor tissue from non-small cell lung cancer patients. Chin Med J (Engl). 2014;127(8):1464-8. [PubMed: 24762590].

  • 6.

    Wang J, Song J, Gao Z, Huo X, Zhang Y, Wang W, et al. Analysis of gene expression profiles of non-small cell lung cancer at different stages reveals significantly altered biological functions and candidate genes. Oncol Rep. 2017;37(3):1736-46. doi: 10.3892/or.2017.5380.

  • 7.

    Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545-50. doi: 10.1073/pnas.0506580102. [PubMed: 16199517]. [PubMed Central: PMC1239896].

  • 8.

    Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34(3):267-73. doi: 10.1038/ng1180. [PubMed: 12808457].

  • 9.

    Cai B, Jiang X. Revealing Biological Pathways Implicated in Lung Cancer from TCGA Gene Expression Data Using Gene Set Enrichment Analysis. Cancer Inform. 2014;13(Suppl 1):113-21. doi: 10.4137/CIN.S13882. [PubMed: 25520551]. [PubMed Central: PMC4251186].

  • 10.

    Araujo JM, Prado A, Cardenas NK, Zaharia M, Dyer R, Doimi F, et al. Repeated observation of immune gene sets enrichment in women with non-small cell lung cancer. Oncotarget. 2016;7(15):20282-92. doi: 10.18632/oncotarget.7943. [PubMed: 26958810]. [PubMed Central: PMC4991454].

  • 11.

    Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249-64. doi: 10.1093/biostatistics/4.2.249. [PubMed: 12925520].

  • 12.

    Zhao H, Huang M, Chen Q, Wang Q, Pan Y. Comparative gene expression analysis in mouse models for identifying critical pathways in mammary gland development. Breast Cancer Res Treat. 2012;132(3):969-77. doi: 10.1007/s10549-011-1650-8. [PubMed: 21735046].

  • 13.

    Shi I, Hashemi Sadraei N, Duan ZH, Shi T. Aberrant signaling pathways in squamous cell lung carcinoma. Cancer Inform. 2011;10:273-85. doi: 10.4137/CIN.S8283. [PubMed: 22174565]. [PubMed Central: PMC3236010].

  • 14.

    Cancer Genome Atlas Research. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012;489(7417):519-25. doi: 10.1038/nature11404.

  • 15.

    Cooper WA, Lam DC, O'Toole SA, Minna JD. Molecular biology of lung cancer. J Thorac Dis. 2013;5(5):479-90. doi: 10.3978/j.issn.2072-1439.2013.08.03.

  • 16.

    Schramm G, Surmann EM, Wiesberg S, Oswald M, Reinelt G, Eils R, et al. Analyzing the regulation of metabolic pathways in human breast cancer. BMC Med Genomics. 2010;3(1). doi: 10.1186/1755-8794-3-39.

  • 17.

    Tsouko E, Khan AS, White MA, Han JJ, Shi Y, Merchant FA, et al. Regulation of the pentose phosphate pathway by an androgen receptor–mTOR-mediated mechanism and its role in prostate cancer cell growth. Oncogenesis. 2014;3(5):e103. doi: 10.1038/oncsis.2014.18.

  • 18.

    Dennis JW, Granovsky M, Warren CE. Glycoprotein glycosylation and cancer progression. Biochim Biophys Acta. 1999;1473(1):21-34. eng. [PubMed: 10580127].

  • 19.

    Kailemia MJ, Park D, Lebrilla CB. Glycans and glycoproteins as specific biomarkers for cancer. Anal Bioanal Chem. 2017;409(2):395-410. eng. doi: 10.1007/s00216-016-9880-6. [PubMed: 27590322]. [PubMed Central: PMCPmc5203967].

  • 20.

    Lan Y, Hao C, Zeng X, He Y, Zeng P, Guo Z, et al. Serum glycoprotein-derived N- and O-linked glycans as cancer biomarkers. Am J Cancer Res. 2016;6(11):2390-415. [PubMed: 27904760]. [PubMed Central: PMC5126262].

  • 21.

    Zhao YY, Takahashi M, Gu JG, Miyoshi E, Matsumoto A, Kitazume S, et al. Functional roles of N-glycans in cell signaling and cell adhesion in cancer. Cancer Sci. 2008;99(7):1304-10. doi: 10.1111/j.1349-7006.2008.00839.x. [PubMed: 18492092].

  • 22.

    Swann JB, Smyth MJ. Immune surveillance of tumors. J Clin Invest. 2007;117(5):1137-46. doi: 10.1172/JCI31405. [PubMed: 17476343]. [PubMed Central: PMC1857231].

  • 23.

    Seliger B. Strategies of tumor immune evasion. BioDrugs. 2005;19(6):347-54. [PubMed: 16392887].

  • 24.

    Frumento G, Piazza T, Di Carlo E, Ferrini S. Targeting tumor-related immunosuppression for cancer immunotherapy. Endocr Metab Immune Disord Drug Targets. 2006;6(3):233-7. eng. [PubMed: 17017974].

  • 25.

    Domagala-Kulawik J, Osinska I. [Immune alterations in lung cancer - the new therapeutic approach]. Pneumonol Alergol Pol. 2014;82(3):286-99. pol. doi: 10.5603/PiAP.2014.0034. [PubMed: 24793154].

  • 26.

    Martin TA. The role of tight junctions in cancer metastasis. Semin Cell Dev Biol. 2014;36:224-31. eng. doi: 10.1016/j.semcdb.2014.09.008. [PubMed: 25239399].

  • 27.

    Aasen T, Mesnil M, Naus CC, Lampe PD, Laird DW. Gap junctions and cancer: communicating for 50 years. Nat Rev Cancer. 2016;16(12):775-88. eng. doi: 10.1038/nrc.2016.105. [PubMed: 27782134]. [PubMed Central: PMCPmc5279857].

  • 28.

    Trosko JE, Ruch RJ. Gap junctions as targets for cancer chemoprevention and chemotherapy. Curr Drug Targets. 2002;3(6):465-82. eng. [PubMed: 12448698].

  • 29.

    Shi H, Shi D, Wu Y, Shen Q, Li J. Qigesan inhibits migration and invasion of esophageal cancer cells via inducing connexin expression and enhancing gap junction function. Cancer Lett. 2016;380(1):184-90. eng. doi: 10.1016/j.canlet.2016.06.015. [PubMed: 27345741].

  • 30.

    Berrier AL, Yamada KM. Cell-matrix adhesion. J Cell Physiol. 2007;213(3):565-73. eng. doi: 10.1002/jcp.21237. [PubMed: 17680633].

  • 31.

    Nigam AK, Savage FJ, Boulos PB, Stamp GW, Liu D, Pignatelli M. Loss of cell-cell and cell-matrix adhesion molecules in colorectal cancer. Br J Cancer. 1993;68(3):507-14. eng. [PubMed: 8353041]. [PubMed Central: PMCPmc1968382].

  • 32.

    Toyooka S, Tsuda T, Gazdar AF. The TP53 gene, tobacco exposure, and lung cancer. Hum Mutat. 2003;21(3):229-39. eng. doi: 10.1002/humu.10177. [PubMed: 12619108].

  • 33.

    Shigematsu H, Lin L, Takahashi T, Nomura M, Suzuki M, Wistuba ,I, et al. Clinical and biological features associated with epidermal growth factor receptor gene mutations in lung cancers. J Natl Cancer Inst. 2005;97(5):339-46. eng. doi: 10.1093/jnci/dji055. [PubMed: 15741570].

  • 34.

    Martin P, Kelly CM, Carney D. Epidermal growth factor receptor-targeted agents for lung cancer. Cancer Control. 2006;13(2):129-40. eng. doi: 10.1177/107327480601300207. [PubMed: 16735987].

  • 35.

    Yamamoto H, Shigematsu H, Nomura M, Lockwood WW, Sato M, Okumura N, et al. PIK3CA mutations and copy number gains in human lung cancers. Cancer Res. 2008;68(17):6913-21. eng. doi: 10.1158/0008-5472.can-07-5084. [PubMed: 18757405]. [PubMed Central: PMCPmc2874836].

  • 36.

    Gao SY, Li J, Qu XY, Zhu N, Ji YB. Downregulation of Cdk1 and cyclinB1 expression contributes to oridonin-induced cell cycle arrest at G2/M phase and growth inhibition in SGC-7901 gastric cancer cells. Asian Pac J Cancer Prev. 2014;15(15):6437-41. eng. [PubMed: 25124639].

  • 37.

    Sung WW, Lin YM, Wu PR, Yen HH, Lai HW, Su TC, et al. High nuclear/cytoplasmic ratio of Cdk1 expression predicts poor prognosis in colorectal cancer patients. BMC Cancer. 2014;14:951. eng. doi: 10.1186/1471-2407-14-951. [PubMed: 25511643]. [PubMed Central: PMCPmc4302138].

  • 38.

    Nakayama S, Torikoshi Y, Takahashi T, Yoshida T, Sudo T, Matsushima T, et al. Prediction of paclitaxel sensitivity by CDK1 and CDK2 activity in human breast cancer cells. Breast Cancer Res. 2009;11(1):R12. eng. doi: 10.1186/bcr2231. [PubMed: 19239702]. [PubMed Central: PMCPmc2687717].

  • 39.

    Shi Q, Zhou Z, Ye N, Chen Q, Zheng X, Fang M. MiR-181a inhibits non-small cell lung cancer cell proliferation by targeting CDK1. Cancer Biomark. 2017;20(4):539-46. eng. doi: 10.3233/cbm-170350. [PubMed: 28946554].

  • 40.

    Pu S, Zhao Y, Zhou G, Zhu H, Gong L, Zhang W, et al. Effect of CDK1 shRNA on proliferation, migration, cell cycle and apoptosis in non-small cell lung cancer. J Cell Physiol. 2018;233(9):7514. eng. doi: 10.1002/jcp.26387. [PubMed: 29226963].

  • 41.

    Yan LX, Liu YH, Xiang JW, Wu QN, Xu LB, Luo XL, et al. PIK3R1 targeting by miR-21 suppresses tumor cell migration and invasion by reducing PI3K/AKT signaling and reversing EMT, and predicts clinical outcome of breast cancer. Int J Oncol. 2016;48(2):471-84. eng. doi: 10.3892/ijo.2015.3287. [PubMed: 26676464]. [PubMed Central: PMCPmc4725461].

  • 42.

    Cheung LW, Hennessy BT, Li J, Yu S, Myers AP, Djordjevic B, et al. High frequency of PIK3R1 and PIK3R2 mutations in endometrial cancer elucidates a novel mechanism for regulation of PTEN protein stability. Cancer Discov. 2011;1(2):170-85. eng. doi: 10.1158/ [PubMed: 21984976]. [PubMed Central: PMCPmc3187555].

  • 43.

    Lin Y, Yang Z, Xu A, Dong P, Huang Y, Liu H, et al. PIK3R1 negatively regulates the epithelial-mesenchymal transition and stem-like phenotype of renal cancer cells through the AKT/GSK3beta/CTNNB1 signaling pathway. Sci Rep. 2015;5:8997. eng. doi: 10.1038/srep08997. [PubMed: 25757764]. [PubMed Central: PMCPmc4355729].