Identifying the Most Appropriate Pattern for Identification of Gene Expression Changes in Ovarian Cancer Using Microarray

AUTHORS

Massoume Eskandari 1 , Shahla Chaichian ORCID 2 , * , Jenefer DeKoning 3 , Bahram Moazzami 1 , Pouya Faroughi 4 , Asrin Karimi 5 , Fatemeh Jesmi 1

1 Pars Advanced and Minimally Invasive Medical Manners Research Center, Pars Hospital, Iran University of Medical Sciences, Tehran, Iran

2 Minimally Invasive Techniques Research Center in Women, Tehran Medical Sciences Branch, Islamic Azad University, Tehran, Iran

3 Kashi Clinical Lab, Portland, United States

4 Department of Statistical and Actuarial Sciences, University of Western Ontario, London ,Canada

5 Faculty of Economics and Management, University Putra Malaysia, Serdang, Selangor, Malaysia

How to Cite: Eskandari M, Chaichian S, DeKoning J , Moazzami B, Faroughi P , et al. Identifying the Most Appropriate Pattern for Identification of Gene Expression Changes in Ovarian Cancer Using Microarray, Iran Red Crescent Med J. 2019 ; 21(7):e85420. doi: 10.5812/ircmj.85420.

ARTICLE INFORMATION

Iranian Red Crescent Medical Journal: 21 (7); e85420
Published Online: July 21, 2019
Article Type: Research Article
Received: October 16, 2018
Revised: May 2, 2019
Accepted: June 3, 2019
Crossmark

Crossmark

CHEKING

READ FULL TEXT
Abstract

Background: Microarray technology is an accurate method for recognition of disease association gene alterations. However, there still is not an effective approach for the evaluation of gene expression in ovarian cancer.

Objectives: A reliable approach is described to identify genes associated with ovarian cancer.

Methods: Microarray gene expression data analysis was applied to correct systematic differences through four different normalization methods; LOESS, 3D LOESS, and neural network (NN3, NN4). Then, three different clustering methods of K-means, fuzzy C-means, and hierarchical methods were examined on corrected gene expression values. The proposed approach was tested on a reliable source of genes’ information, where the entropy of genes in samples and Euclidean distance were used for gene selection.

Results: Our findings revealed that a neural-network-based normalization method could better control the effects of non-biological variations from microarray data. Moreover, the hierarchical clustering was more effective compared to other methods, and resulted in the identification of three genes, including BC029410, DUSP2, and ILDR1, as candidates for disease-association genes.

Conclusions: According to the finding of the present study, hierarchical clustering with nonlinear-based normalization could have the ability to prioritize genes for ovarian cancer.

Keywords

Cluster Analysis Entropy Gene Expression Gene Ontology Microarray Analysis Neural Networks Ovarian Neoplasms

Copyright © 2019, Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/) which permits copy and redistribute the material just in noncommercial usages, provided the original work is properly cited

1. Background

Ovarian cancer ranks sixth on most prevalent female cancers worldwide, with an estimated prevalence of 0.9% and 0.5% in developed and developing countries, respectively (1). In Iran, its prevalence is estimated at 0.35% (2). Understanding the etiology of this frequent cancer is of great importance in order to design new strategies for early diagnosis and treatment. Although several hormone-based risk factors like a total number of ovulatory cycles, low parity and non-use of oral contraceptive are suggested for ovarian cancer (3), family history is established as the foremost risk factor (4, 5).

It was shown that the risk of genetic factors in the development of ovarian cancer is 3.1-fold greater among first-degree relatives, which increases to 3.8-fold in sisters and 6.0-fold in daughters (6). Other studies addressing the risk of ovarian cancer genetic attribution, suggested a 15-fold greater, age-specific risk in carriers of breast/ovarian cancer susceptibility gene (7) compared to non-carriers. More than 10% of epithelial ovarian cancer was also reported to be hereditary (8). The BRCA genes attribution to familial breast-ovarian cancer is well established (9), and inhibitors repairing DNA single-strand breaks are suggested as a treatment strategy (10). Moreover, hereditary nonpolyposis colorectal cancer (HNPCC; Lynch II) is associated with defects in the mismatch repair system, MSH2, and MLH1 genes (8). Other oncogenes, including HER-2/neu, c-myc, and K-ras and the tumor suppressor gene p53, were also commonly observed sporadically in ovarian cancer (11, 12).

Recent studies have shown a distinct pattern of gene expression in the transcription of DNA into RNA. One of the most important methods to identify disease-associated genes is microarray (13). Microarray technology provides the ability to monitor thousands of transcripts (14) simultaneously and has been previously used for the molecular classification of various tumors (15, 16). Analyzing gene expression data can help to identify genes that express differently in patients’ tissue. However, microarray data are exposed to non-biological sources of variations that have nonlinearity and intensity-dependency nature and should be corrected (17). Microarray normalization could help to control the effects of these undesired variations (18). Current linear normalization methods cannot completely correct these effects, and therefore, new proper nonlinear-based normalization methods are needed. Many clustering techniques, ranging from simple statistical to novel data mining methods, are used to reveal different patterns in gene expression data, however, each one has its advantages and disadvantages (19).

2. Objectives

The objective of this study was to find an appropriate pattern for identification of gene expression changes in ovarian cancer with the use of microarray technology. To do this, four different normalization methods; locally estimated scatterplot smoothing (LOESS), 3D LOESS; NN3 and NN4, were tested to normalize microarray data, and then three different clustering algorithms; k-means, fuzzy c-means and hierarchical were examined, to finally determine the best and most efficient algorithm. As long as the etiology of ovarian cancer remains unclear, the study of the role of genetics is an appropriate approach that could help determine the genes implicated in this cancer and other associated cancers.

3. Methods

3.1. Research Data

In this bioinformatics-based cross-sectional study, data for this study were derived from the ArrayExpress in May 2014 (https://www.ncbi.nlm.nih.gov/geo/). In this database, the baseline characteristics (age, ethnicity), cancer staging based on FIGO classification, the details of the tissue samples and its sources, and also gene candidates linked to ovarian cancer were collected and extracted for the present analyses. Two observers were employed to extract the data from databases with a good inter-observer agreement with a kappa value of 0.87. Four arrays labeled with cy3 were used for healthy samples (aged 32 to 37 years). Four arrays were labeled with cy5 for patient samples as well. To determine and rank the disease-association gene candidates, the following parameters and hypotheses were considered in this study:

1. In all four cases, cy5 values belonged to patient samples, and cy3 values were for healthy ones.

2. Levels of gene expression in healthy and patients’ samples will be significantly different if the interrogated gene attributes in the process of the disease. The farther from zero the normalized logarithmic ratio of the gene expression is, the greater the likelihood of the gene to be specifically associated with the disease.

3. For the expression of a gene to be implicated in the manifestation of the disease, it must have an approximately identical distance from zero in all samples. In other words, if the expression value is significantly distant from zero in one patient, but is nearly zero in another patient, we did not consider that gene suitable as a gene candidate in our data.

3.2. Microarray Gene Expression Data Analysis

Microarray data was preprocessed thorough severity, spatial and optimal amount of gene expression by neural network-based normalization to correct systematic differences. LOESS is a nonparametric regression method incorporating several regression models with a k-nearest-neighbor (KNN) to carry out normalization based on severity. The assumption made in the LOESS method is non-different expression of most of the array’s genes under investigated conditions.

The LOESS method (spatial normalization) was used on three dimensional (3D) diagrams for determining trends and providing spatial trend diagrams (13). The 3D LOESS can estimate logarithmic ratios dependency on physical locations of spots by fitting a curve to the data.

Neural network-based normalization was used to estimate the optimal amount of gene expression and aims to predict the optimum amount that satisfies the condition of minimizing the sum of squared errors (20). The most common type of neural network is Multi-layer Perceptron (MLP). Without assuming a form equation, MLPs can mimic a complex nonlinear relationship between a multiple input vector and an output variable associated with the educational paradigms.

Normalization of probes was evaluated by parameters to assess data quality, and the best method was selected. Residual values were selected as the result of normalization and are considered as input data for the next step, which is to identify and remove outliers. Two criteria used to evaluate the quality levels of gene expression in various iterations, are AV and AN, and have been described elsewhere. The best method for normalization is a method that minimizes the variability of expression levels between different repeats.

3D clustering was used to determine outliers. The 3D charts showing the gene expression data after pre-processing steps could look for a probe or array variations to differentiate background signal noises from exact expression signal data. These variations include biological effects such as stains and dust on the slides. 3D charts show intensity values and the trends in specific areas of the graph.

The mean value of probe expression levels of a gene are calculated and stored in the database as the amount of gene expression. After recording the expression levels of genes in the database, normalization steps were completed, and the selection process of candidates for disease-associated genes was initiated.

Euclidean distance and entropy were used to prioritize genes that might potentially be involved with the specific disease. Genes with higher priorities are more likely to have a true association with the disease. Filtering methods evaluate the possibility of a disease-associated gene by itself. It should be noted that unwanted genes increase calculation time and reduce accuracy. Therefore, prior to microarray data processing, redundant genes were deleted according to their similarity.

In a large series of samples, similar genes can be detected using Pearson’s correlation coefficient. However, in a small series of samples, genes similarity cannot be identified solely with respect to their values, and their involvement in operations and biological processes should also be considered.

In agreement with previous studies, information about biological processes was used to determine the semantic similarity of genes. It should be noted that the pairwise semantic similarity for all genes is not possible, since some gene information is not available in the Gene Ontology (GO) database. Thus, genes associated with available data in the GO database were determined, and then those with similar gene operations were identified and removed. The value obtained from the following formula (21) represents the similarity for two genes:

Sgigi,gjgj=Sexpgi,gj+ Ssem(gi,gj)2 Sexpgi,gj+ Ssem(gi,gj)2

The closer the calculated value is to one, the more the similarity of these two genes. In a similarity matrix, two genes are considered similar, when S (gigi, gjgj) is greater than 0.8. The genes are arranged in an ascending order, in terms of disease-association potential. Choosing the first gene from this list allows us to remove other similar genes; removed genes enjoy lower disease-association potential than other cases.

In this study, all four cases are associated with patients. Therefore, unsupervised clustering methods are used to determine potential disease-association genes. The classification method illustrates each component of the data, based on the difference among them, compared to a pre-defined data set of classes. Clustering divides the data into different groups, which are not pre-determined (based on the intra-difference and inter- similarities).

The following clustering methods were used to determine the genes that may potentially be involved with disease initiation or progression:

1. K-Means: This algorithm considers parameter “K” as an input and divides a set of “n” objects into “k” clusters, to facilitate a high level of similarity within the clusters and a low-level outside of the clusters. Each similarity within the cluster is measured to the average objects of that cluster, which is also called the “cluster center.”

2. Fuzzy C-Means: A clustering technique, in which each point belongs to a cluster, to a certain degree. Bezdek (1981) proposed this technique to improve the effectiveness of earlier clustering methods (22). Fuzzy C-Means illustrates how the group data have been described in a multidimensional space, in a certain number of different clusters. This method starts by placing an arbitrary location as the center of clusters. Usually, this initial guess is incorrect and does not indicate the correct location of centers. Then, this method relates each of the points with the extent of association to each of the clusters. Duplicated updating centers of clusters and degree of membership for each of the points, gradually causes the centers of clusters to be transferred to the correct and real place in data sets. This duplicated updating is grounded on minimizing the objective function. The objective function is indicative of the distance between each of the points to the cluster center, based on its degree of membership. The output of this method consists of a list of cluster centers, and the degree of membership of each of the data points to each of the clusters.

3. Hierarchical Clustering: The hierarchical structure method creates a known set of objects. This method can perform clustering into a collective or divided method. The collective method is called a bottom-up procedure that starts by forming separate groups, each containing at least one object. Then, it unifies the objects or near groups, so that eventually an overall group is established at the highest level. In the dividend method, all objects are considered into a cluster, and in each repetition, each cluster is divided into two smaller clusters. According to the disease state of the four subjects, we can use two factors. First, the distance of the gene expression amount from zero, and second, the entropy of a gene value (in various samples) produced in previous steps to suggest candidate disease-association genes. The greater the distance a gene is from zero and the lower its entropy, the more likely it is to be associated with disease.

3.3. Statistical Analysis

Robust microarray analysis algorithm was applied to process data files containing the probe level intensities for background adjustment, normalization and determination of the genes specifically associated with the disease. Locally weighted least squares regression (LOESS) was applied to carry out normalization based on severity of gene expression. R software was used for LOESS and LOESS 3D methods, and neural network normalization methods (NN3 and NN4) were implemented via MATLAB (nftool).

4. Results

Characteristics of this study data are given in Table 1. The data analysis included two stages: (1) normalization of microarrays and (2) determination of candidate genes involved with ovarian cancer.

Table 1. Baseline Characteristics of the Subjects Included in the Study (Suffering Ovarian Cancer or in Normal Healthy Condition)
OrganismEthnicityFigo StageAgeSampleSource Name
Ovarian tumors (primary)EuropeanIA26RNA samples from ovarian cancer tissueGSM1208935Cy5
Healthy unaffected ovaries32 - 37RNA samples from healthy tissue of two personsGSM1208935Cy3
Ovarian tumors (primary)EuropeanIIIC21RNA samples from ovarian cancer tissueGSM1208936Cy5
Healthy unaffected ovaries32 - 37RNA samples from healthy tissue of two personsGSM1208936Cy3
Ovarian tumors (relapse)African-AmericanIIIC33RNA samples from ovarian cancer tissueGSM1208937Cy5
Healthy unaffected ovaries 32 - 37RNA samples from healthy tissue of two personsGSM1208937Cy3
Ovarian tumors (relapse)African-AmericanIIIA27RNA samples from ovarian cancer tissueGSM1208938Cy5
Healthy unaffected ovaries 32 - 37RNA samples from healthy tissue of two personsGSM1208938Cy3

4.1. Microarray Normalization

At first, two-channel microarray values were transformed using M-A transformation. Figure 1 shows the histogram charts related to the M values for four microarrays. Next, small arrays were scaled. Figure 2 exhibits the histogram charts for M values after scaling microarrays based on the chromosome. Normalization was then carried out using various methods included: LOESS; LOESS 3D; NN3 and NN4 and the Microarrays GSM1208936, GSM1208937, GSM1208938, GSM1208935. Values were obtained using the pointed methods.

M values after scaling microarray
Figure 2. M values after scaling microarray

Two AN and AV parameters were then calculated for all the four microarrays (Table 2). Comparison of AV and AN showed that NN is the most effective normalization method. After normalization, outliers were determined using 3D clustering.

Table 2. AN and AV Parameters Calculated for All the Four Microarrays
NameMethodAVAN
GSM1208935MScailing0.861075615.85251
GSM1208935NN31.49033215.87653
GSM1208935NN41.59061216.28237
GSM1208935RESLoess0.270190815.31752
GSM1208935RESLoess3D0.356253615.51752
GSM1208936MScailing0.823489715.11501
GSM1208936NN31.54871715.87695
GSM1208936NN41.59061915.75516
GSM1208936RESLoess0.278481114.13252
GSM1208936RESLoess3D0.331106214.81607
GSM1208937MScailing0.86248715.4872
GSM1208937NN31.71684116.47974
GSM1208937NN41.60703315.31294
GSM1208937RESLoess0.290091514.44551
GSM1208937RESLoess3D0.345928615.20879
GSM1208938MScailing0.892586214.98746
GSM1208938NN30.710850414.97426
GSM1208938NN40.826790315.83412
GSM1208938RESLoess0.290190814.44634
GSM1208938RESLoess3D0.370995714.89216

Considering the identification of four corners as outliers and association of the microarray side sections with control data, we removed the control data from the microarrays and stored only data related to the probes in the database. The 3D clustering operations were then repeated as illustrated in Figure 3. The average value of probes related to a gene were calculated and considered as the amount of gene expression.

The results of 3D clustering on four microarrays after deleting control data
Figure 3. The results of 3D clustering on four microarrays after deleting control data

4.2. Determining the Genes Specifically Associated with Disease

The genes with the greatest potential for specific association with disease initiation or progression were identified based on the known amounts of gene expression for the four microarrays. Next, genes with biological sense, similar values, and lower priority were removed from the gene list. After that the genes were classified into two groups, while genes associated with zero in the main Diagonal Semantic Similarity Matrix (i.e., those with no information in Gene Ontology) were removed. All genes remaining in a group were then investigated, and their semantic similarity was identified. The number of genes decreased from 25,811 to 15,508 cases after we removed similar genes. To determine candidate disease-association genes, the K-means method was used. Points used in this formula (23) indicated how to cluster genes from their Euclidean distance.

Dx = kmeans (gene, n) where “n” indicates number of clusters, and genes are Matrix Input Data. The matrix input data was 4*15,508 here, containing expression values for 15,508 genes of the four samples.

Entropy and distance value of each gene were also used for clustering. Generally, data sets with three, four or eight clusters are inappropriate for the K-means approach (24). Our clustering results based on gene expression, entropy and distance values showed that the K-mean method was not appropriate.

The hierarchical clustering method was also applied to identify potential disease-association genes, where Matrix Distance Priority contains the space and the entropy of each gene. The clustering result is illustrated in Figure 4. Fuzzy C-means was also applied to data. The fuzzy clustering method, which uses gene expression values, only illustrates distance. Therefore, using this method, along with gene expression values, is also not an ideal solution for any researcher. Figure 5 represents clustering in the form of 20 clusters.

Hierarchical clustering results using the entropy and distance values of each gene
Figure 4. Hierarchical clustering results using the entropy and distance values of each gene
Fuzzy clustering results using the entropy and distance values in the form of 20 clusters
Figure 5. Fuzzy clustering results using the entropy and distance values in the form of 20 clusters

Despite an increase in clusters, the researcher’s intended points, which were different from each other in terms of entropy and distance, were not located within acceptable clusters. It is obvious that the K-mean and fuzzy clustering methods were not able to determine genes with minimum entropy and maximum distance. Thus, the hierarchical clustering method was selected as the best method. Table 3 shows the list of genes with high priority.

Table 3. List of Genes with High Priority for Final Assessment
No.Gene NamePriorityScaling EntropyScaling Distance
1BC029410128.30660.52359900.56941318
2DUSP270.867110.44453640.32716337
3LOC10012849663.025510.06051024
4ILDR157.994580.23188010.56317825
5A_33_P325320939.049610.64165080.05314698
6RAB3736.723580.23637110.3053997
7OR6F135.336960.51814710.0729519
8ABCC1234.928760.46087740.0936140
9ENST0000027341134.833260.13088000.5787842
10HTRA134.150550.13914390.5279381
11ADIG33.766210.49355990.0733250
12A_33_P341552631.62250.10789760.6357738
13APOC130.675920.15567390.4052003
14RGPD128.56350.20879120.2522562
15SLC5A927.001590.41602590.0626349
16BRI326.640010.10185140.5498177
17A_32_P1849325.39690.08735340.6108721
18HLA.F25.38310.10672110.4923447
19MGP25.18520.09392680.5602423
20KMO24.341740.14773070.3176388
21SH2D3A23.678880.15516110.2869961
22FLJ1129223.388730.08291090.5843857
23NP51116623.289150.33838690.0722678
24TXNDC823.113110.33853320.0707475
25ENST0000047421323.06440.13589170.3276581

The first two genes on the list are of priority. Gene No. 3 has the best entropy value, but an undesirable distance; Gene No. 4 enjoys acceptable distance and entropy; and Gene No. 5 has acceptable entropy, but undesirable distance. The remaining genes in the table lack desirable values in distance and entropy. Consequently, we introduced Genes Dual-specificity phosphatases 2 (DUSP2), immunoglobulin-like domain-containing receptor 1 (ILDR1) and the locus BC029410 as candidates for disease-association genes.

5. Discussion

The main objective of the present study was to find an effective algorithm to identify and prioritize the genes associated with ovarian cancer that may be involved in either the beginning or evolution of the disease. For this purpose, we examined K-means, fuzzy C-means, and hierarchical clustering. However, the results indicated that using K- and fuzzy C-means is not an ideal solution for gene expression. Therefore, we considered hierarchical clustering, which resulted in the identification of three genes, including BC029410, DUSP2, and ILDR1, as candidates for disease-association genes, which suggests this method of clustering has the ability to prioritize genes in this model.

Our data are supported by findings from the National Center for Biotechnology Information (NCBI) that introduced DUSP2 and ILDR1 genes as important genes that cause ovarian cancer. These genes play a role in other diseases as well. ILDR1 (Angulin-2), is located at chromosome 3q21.1 and its expressed protein serves a transmembrane function and is localized at tight junctions of cells. It assists epithelial cellular sheets in maintaining their barrier function and is expressed in several organs, such as prostate, and testis, pancreas, kidney, liver, and heart (25, 26). Expression of ILDR1 was originally found in lymphoma cells (25) and was recently found to be required for hearing, in that its mutation causes familial nonsyndromic deafness in humans (26, 27). A very recent study has also demonstrated the role of ILDR1 and 2 in aggressive breast cancer cell behavior (28). Although our study was the first to determine the role of ILDR1 in ovarian cancer, previous research identified its significance in breast cancer and thus supported the role of ILDR1 in ovarian cancer, as other genes have also been found effective in breast-ovarian syndrome (7, 9).

DUSPs can dephosphorylate both threonine/serine and tyrosine residues and has 25 family members (29); DUSP1, 4, and 6 are suggested to be involved in breast cancer stem cell regulation (30). DUSP2, located at chromosome 2p11.2-q1, is a mitogen-induced gene that encodes a nuclear protein acting as a MAP kinase phosphatase (31). DUSP2 was found to regulate the immune response in animal models (32). Additionally, DUSP2 is suggested to be involved in the development of endometriosis (33). Hypermethylation of DUSP2 is reported in various cancer cell lines, including primary Merkel cell carcinoma, skin and lung cancer (34). It has also been demonstrated that DUSP2 is inversely correlated with hypoxia-mediated lapatinib resistance in breast cancer cells that is effective in tumor progression (35). Likewise, the association of genes participating in breast and ovarian cancer support the finding of the current study regarding the role of DUSP2 in ovarian cancer.

BC029410 appears to be a transcribed locus of unknown function, with some homology to Homo sapiens. Translation machinery associated 7 homologs (S. cerevisiae) pseudogene (LOC728416) on chromosome 7 (GenBank ref/NG 027829.1).

In the present study, through the unique microarray method, we were able to show that applying a specific computational pattern to microarray data can help identify genes that potentially cause ovarian cancer. Our approach is parallel to previous studies that verified the usefulness of this novel technology for the assessment of gene expression patterns in human cancer cells (36, 37), colon, and prostate cancer, and lymphoma (21). Notably, studies evaluating gene expression in ovarian cancer have scarcely used microarray assessment. Mok and colleagues have assessed CA 125 and prostasin in ovarian cancer by microarray and posited higher serum levels of prostasin in patients diagnosed with ovarian cancer, which decrease after surgical treatment (38). Although in their study, 30 genes were identified to have a Cy3/Cy5 signal ratio of ≥ 5, none of the genes discussed in the present study were found by Mok et al. Further, Wulfkuhle et al. also suggested the utility of reverse-phase protein microarrays for the multiplexed analysis of human ovarian tumor specimens (39). They suggested that “patterns in signal pathway activation in ovarian tumors may be patient-specific rather than type or stage specific.” Yang et al. also evaluated the response to chemotherapy in miRNA by microarray and demonstrated the association of reduced let-7i expression with shorter patient survival (40). Similar to the suggestion of the present study, the above-mentioned studies have posited the significance of microarray technology in identification of tumor biomarkers in ovarian cancer.

Several bioinformatics studies have addressed the major algorithms for different methods and have suggested that microarray data can create a reliable cancer diagnostic model (41), but they have not focused on ovarian cancer. In the present study, we were able to examine K-means, fuzzy C-means, and hierarchical clustering. We suggest hierarchical clustering as the most appropriate method for the assessment of ovarian cancer cells. High number of genes and a limited number of cases are microarray problems. These issues were improved in the present study by using a larger sample size. However, it caused some limitations, including the fact that extracting data of different races may be a confounding factor in the genetic assessment. As the genetic investigation is highly expensive, evaluating the genetic profile in patients from developing countries is limited. Additionally, any inaccuracy in the original gene expression data (42), might affect the results of the current study.

Moreover, the feature selection method proposed by the present study needs to be further evaluated. Thus, it is suggested that future studies consider both the distance and the entropy factor of genes responsible for this cancer. Insofar as the proposed pattern also has the ability to extend to other diseases, it is suggested that further research studies evaluate the gene expression of other diseases, especially when the data are associated with the patient, rather than healthy and sick people, where classification algorithms are suggested.

The assessment of gene expression in common but life-threatening cancers such as ovarian cancer in women is very vital because the difference in gene expression patterns in different societies, different survival and treatment outcomes according to difference in gene behavior, and also different subtypes of ovarian cancer with different prognosis. Moreover, employing various prediction models based on clinical, genetic and bio-statistical variables could not satisfy the clinicians because of high false positive and false negative values, the presence of confounder variations as well as undesirable fitness. In this regard, it seems that the minute assessment of gene expression of the common gene variants related to the risk for ovarian cancer along with clinical predictors can help clinicians to currently assess clinical and genetic behavior of genes related to cancer.

The main strength of the study was that for the first time aided by microarray analysis and employing bioinformatics tools, the gene expression in ovarian cancer was deeply analyzed, however the study had potential limitations. First, due to the differences in gene behavior among different ethnicities, it should be analyzed in each society considering the ethnic characteristics of the populations. Second, more gene and related microRNAs are now candidates for the occurrence of ovarian cancer that should be considered in further studies.

5.1. Conclusions

In conclusion, hierarchical clustering demonstrated that BC029410, DUSP2, and ILDR1 genes are important genes in ovarian cancer that can be used in gene therapy for cancers.

Acknowledgements

Footnotes

References

  • 1.

    Fortner RT, Poole EM, Wentzensen NA, Trabert B, White E, Arslan AA, et al. Ovarian cancer risk factors by tumor aggressiveness: An analysis from the Ovarian Cancer Cohort Consortium. Int J Cancer. 2019;145(1):58-69. doi: 10.1002/ijc.32075. [PubMed: 30561796]. [PubMed Central: PMC6488363].

  • 2.

    Razi S, Ghoncheh M, Mohammadian-Hafshejani A, Aziznejhad H, Mohammadian M, Salehiniya H. The incidence and mortality of ovarian cancer and their relationship with the Human Development Index in Asia. Ecancermedicalscience. 2016;10:628. doi: 10.3332/ecancer.2016.628. [PubMed: 27110284]. [PubMed Central: PMC4817525].

  • 3.

    Risch HA. Hormonal etiology of epithelial ovarian cancer, with a hypothesis concerning the role of androgens and progesterone. J Natl Cancer Inst. 1998;90(23):1774-86. doi: 10.1093/jnci/90.23.1774. [PubMed: 9839517].

  • 4.

    Reid BM, Permuth JB, Sellers TA. Epidemiology of ovarian cancer: a review. Cancer Biol Med. 2017;14(1):9-32. doi: 10.20892/j.issn.2095-3941.2016.0084. [PubMed: 28443200]. [PubMed Central: PMC5365187].

  • 5.

    Reid BM, Permuth JB, Chen YA, Fridley BL, Iversen ES, Chen Z, et al. Genome-wide analysis of common copy number variation and epithelial ovarian cancer risk. Cancer Epidemiol Biomarkers Prev. 2019;28(7):1117-26. doi: 10.1158/1055-9965.EPI-18-0833. [PubMed: 30948450]. [PubMed Central: PMC6606353].

  • 6.

    Stratton JF, Pharoah P, Smith SK, Easton D, Ponder BA. A systematic review and meta-analysis of family history and risk of ovarian cancer. Br J Obstet Gynaecol. 1998;105(5):493-9. doi: 10.1111/j.1471-0528.1998.tb10148.x. [PubMed: 9637117].

  • 7.

    Hinchcliff EM, Bednar EM, Lu KH, Rauh-Hain JA. Disparities in gynecologic cancer genetics evaluation. Gynecol Oncol. 2019;153(1):184-91. doi: 10.1016/j.ygyno.2019.01.024. [PubMed: 30711300]. [PubMed Central: PMC6430691].

  • 8.

    Prat J, Ribe A, Gallardo A. Hereditary ovarian cancer. Hum Pathol. 2005;36(8):861-70. doi: 10.1016/j.humpath.2005.06.006. [PubMed: 16112002].

  • 9.

    Narod SA, Feunteun J, Lynch HT, Watson P, Conway T, Lynch J, et al. Familial breast-ovarian cancer locus on chromosome 17q12-q23. Lancet. 1991;338(8759):82-3. doi: 10.1016/0140-6736(91)90076-2. [PubMed: 1676470].

  • 10.

    O'Donovan PJ, Livingston DM. BRCA1 and BRCA2: breast/ovarian cancer susceptibility gene products and participants in DNA double-strand break repair. Carcinogenesis. 2010;31(6):961-7. doi: 10.1093/carcin/bgq069. [PubMed: 20400477].

  • 11.

    Aunoble B, Sanches R, Didier E, Bignon YJ. Major oncogenes and tumor suppressor genes involved in epithelial ovarian cancer (review). Int J Oncol. 2000;16(3):567-76. doi: 10.3892/ijo.16.3.567. [PubMed: 10675491].

  • 12.

    Berkenblit A, Cannistra SA. Advances in the management of epithelial ovarian cancer. J Reprod Med. 2005;50(6):426-38. [PubMed: 16050567].

  • 13.

    Neuvial P, Hupe P, Brito I, Liva S, Manie E, Brennetot C, et al. Spatial normalization of array-CGH data. BMC Bioinformatics. 2006;7:264. doi: 10.1186/1471-2105-7-264. [PubMed: 16716215]. [PubMed Central: PMC1523216].

  • 14.

    Ball CA, Sherlock G, Parkinson H, Rocca-Sera P, Brooksbank C, Causton HC, et al. Standards for microarray data. Science. 2002;298(5593):539. doi: 10.1126/science.298.5593.539b. [PubMed: 12387284].

  • 15.

    Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531-7. doi: 10.1126/science.286.5439.531. [PubMed: 10521349].

  • 16.

    Wang T, Hopkins D, Schmidt C, Silva S, Houghton R, Takita H, et al. Identification of genes differentially over-expressed in lung squamous cell carcinoma using combination of cDNA subtraction and microarray analysis. Oncogene. 2000;19(12):1519-28. doi: 10.1038/sj.onc.1203457. [PubMed: 10734311].

  • 17.

    Workman C, Jensen LJ, Jarmer H, Berka R, Gautier L, Nielser HB, et al. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol. 2002;3(9):research0048. doi: 10.1186/gb-2002-3-9-research0048. [PubMed: 12225587]. [PubMed Central: PMC126873].

  • 18.

    Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19(2):185-93. doi: 10.1093/bioinformatics/19.2.185. [PubMed: 12538238].

  • 19.

    Wang K, Wang W, Li M. A brief procedure for big data analysis of gene expression. Animal Model Exp Med. 2018;1(3):189-93. doi: 10.1002/ame2.12028. [PubMed: 30891564]. [PubMed Central: PMC6388068].

  • 20.

    Tarca AL, Cooke JE, Mackay J. A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data. Bioinformatics. 2005;21(11):2674-83. doi: 10.1093/bioinformatics/bti397. [PubMed: 15797913].

  • 21.

    Mohammadi A, Saraee MH, Salehi M. Identification of disease-causing genes using microarray data mining and gene ontology. BMC Med Genomics. 2011;4:12. doi: 10.1186/1755-8794-4-12. [PubMed: 21269461]. [PubMed Central: PMC3037837].

  • 22.

    Gupta A, Datta S, Das S. Fuzzy clustering to identify clusters at different levels of fuzziness: An evolutionary multiobjective optimization approach. IEEE Trans Cybern. 2019. doi: 10.1109/TCYB.2019.2907002. [PubMed: 30998486].

  • 23.

    Wu FX. Genetic weighted k-means algorithm for clustering large-scale gene expression data. BMC Bioinformatics. 2008;9 Suppl 6. S12. doi: 10.1186/1471-2105-9-S6-S12. [PubMed: 18541047]. [PubMed Central: PMC2423435].

  • 24.

    Pham DT, Dimov SS, Nguyen CD. Selection of K in K-means clustering. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science. J Mechanic Engineering Sci; 2016. p. 103-19.

  • 25.

    Hauge H, Patzke S, Delabie J, Aasheim HC. Characterization of a novel immunoglobulin-like domain containing receptor. Biochem Biophys Res Commun. 2004;323(3):970-8. doi: 10.1016/j.bbrc.2004.08.188. [PubMed: 15381095].

  • 26.

    Higashi T, Tokuda S, Kitajiri S, Masuda S, Nakamura H, Oda Y, et al. Analysis of the 'angulin' proteins LSR, ILDR1 and ILDR2--tricellulin recruitment, epithelial barrier function and implication in deafness pathogenesis. J Cell Sci. 2013;126(Pt 4):966-77. doi: 10.1242/jcs.116442. [PubMed: 23239027].

  • 27.

    Borck G, Ur Rehman A, Lee K, Pogoda HM, Kakar N, von Ameln S, et al. Loss-of-function mutations of ILDR1 cause autosomal-recessive hearing impairment DFNB42. Am J Hum Genet. 2011;88(2):127-37. doi: 10.1016/j.ajhg.2010.12.011. [PubMed: 21255762]. [PubMed Central: PMC3035704].

  • 28.

    Reaves DK, Hoadley KA, Fagan-Solis KD, Jima DD, Bereman M, Thorpe L, et al. Nuclear localized LSR: A novel regulator of breast cancer behavior and tumorigenesis. Mol Cancer Res. 2017;15(2):165-78. doi: 10.1158/1541-7786.MCR-16-0085-T. [PubMed: 27856957]. [PubMed Central: PMC5290211].

  • 29.

    Huang CY, Tan TH. DUSPs, to MAP kinases and beyond. Cell Biosci. 2012;2(1):24. doi: 10.1186/2045-3701-2-24. [PubMed: 22769588]. [PubMed Central: PMC3406950].

  • 30.

    Boulding T, Wu F, McCuaig R, Dunn J, Sutton CR, Hardy K, et al. Differential roles for DUSP family members in epithelial-to-mesenchymal transition and cancer stem cell regulation in breast cancer. PLoS One. 2016;11(2). e0148065. doi: 10.1371/journal.pone.0148065. [PubMed: 26859151]. [PubMed Central: PMC4747493].

  • 31.

    Yi H, Morton CC, Weremowicz S, McBride OW, Kelly K. Genomic organization and chromosomal localization of the DUSP2 gene, encoding a MAP kinase phosphatase, to human 2p11.2-q11. Genomics. 1995;28(1):92-6. doi: 10.1006/geno.1995.1110. [PubMed: 7590752].

  • 32.

    Jeffrey KL, Brummer T, Rolph MS, Liu SM, Callejas NA, Grumont RJ, et al. Positive regulation of immune cell function and inflammatory responses by phosphatase PAC-1. Nat Immunol. 2006;7(3):274-83. doi: 10.1038/ni1310. [PubMed: 16474395].

  • 33.

    Wei W, Jiao Y, Postlethwaite A, Stuart JM, Wang Y, Sun D, et al. Dual-specificity phosphatases 2: Surprising positive effect at the molecular level and a potential biomarker of diseases. Genes Immun. 2013;14(1):1-6. doi: 10.1038/gene.2012.54. [PubMed: 23190643].

  • 34.

    Haag T, Richter AM, Schneider MB, Jimenez AP, Dammann RH. The dual specificity phosphatase 2 gene is hypermethylated in human cancer and regulated by epigenetic mechanisms. BMC Cancer. 2016;16:49. doi: 10.1186/s12885-016-2087-6. [PubMed: 26833217]. [PubMed Central: PMC4736155].

  • 35.

    Karakashev SV, Reginato MJ. Hypoxia/HIF1alpha induces lapatinib resistance in ERBB2-positive breast cancer cells via regulation of DUSP2. Oncotarget. 2015;6(4):1967-80. doi: 10.18632/oncotarget.2806. [PubMed: 25596742]. [PubMed Central: PMC4385829].

  • 36.

    DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, et al. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet. 1996;14(4):457-60. doi: 10.1038/ng1296-457. [PubMed: 8944026].

  • 37.

    Greenman CD, Bignell G, Butler A, Edkins S, Hinton J, Beare D, et al. PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data. Biostatistics. 2010;11(1):164-75. doi: 10.1093/biostatistics/kxp045. [PubMed: 19837654]. [PubMed Central: PMC2800165].

  • 38.

    Mok SC, Chao J, Skates S, Wong K, Yiu GK, Muto MG, et al. Prostasin, a potential serum marker for ovarian cancer: Identification through microarray technology. J Natl Cancer Inst. 2001;93(19):1458-64. doi: 10.1093/jnci/93.19.1458. [PubMed: 11584061].

  • 39.

    Wulfkuhle JD, Aquino JA, Calvert VS, Fishman DA, Coukos G, Liotta LA, et al. Signal pathway profiling of ovarian cancer from human tissue specimens using reverse-phase protein microarrays. Proteomics. 2003;3(11):2085-90. doi: 10.1002/pmic.200300591. [PubMed: 14595806].

  • 40.

    Yang N, Kaur S, Volinia S, Greshock J, Lassus H, Hasegawa K, et al. MicroRNA microarray identifies Let-7i as a novel biomarker and therapeutic target in human epithelial ovarian cancer. Cancer Res. 2008;68(24):10307-14. doi: 10.1158/0008-5472.CAN-08-1954. [PubMed: 19074899]. [PubMed Central: PMC2762326].

  • 41.

    Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics. 2005;21(5):631-43. doi: 10.1093/bioinformatics/bti033. [PubMed: 15374862].

  • 42.

    Ambroise C, McLachlan GJ. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci U S A. 2002;99(10):6562-6. doi: 10.1073/pnas.102102699. [PubMed: 11983868]. [PubMed Central: PMC124442].

  • COMMENTS

    LEAVE A COMMENT HERE: