Abstract
BackgroundThe early diagnosis of lung cancer has been a critical problem in clinical practice for a long time and identifying differentially expressed gene as disease marker is a promising solution. However, the most existing gene differential expression analysis (DEA) methods have two main drawbacks: First, these methods are based on fixed statistical hypotheses and not always effective; Second, these methods can not identify a certain expression level boundary when there is no obvious expression level gap between control and experiment groups.MethodsThis paper proposed a novel approach to identify marker genes and gene expression level boundary for lung cancer. By calculating a kernel maximum mean discrepancy, our method can evaluate the expression differences between normal, normal adjacent to tumor (NAT) and tumor samples. For the potential marker genes, the expression level boundaries among different groups are defined with the information entropy method.ResultsCompared with two conventional methods t-test and fold change, the top average ranked genes selected by our method can achieve better performance under all metrics in the 10-fold cross-validation. Then GO and KEGG enrichment analysis are conducted to explore the biological function of the top 100 ranked genes. At last, we choose the top 10 average ranked genes as lung cancer markers and their expression boundaries are calculated and reported.ConclusionThe proposed approach is effective to identify gene markers for lung cancer diagnosis. It is not only more accurate than conventional DEA methods but also provides a reliable method to identify the gene expression level boundaries.
Highlights
The early diagnosis of lung cancer has been a critical problem in clinical practice for a long time and identifying differentially expressed gene as disease marker is a promising solution
The processed Genotype-Tissue Expression (GTEx) expression profiles of dataset 1 are available in Gene Expression Omnibus (GEO) under an accession number GSE86354 and other two datasets are deposited as GSE62944
In the first part, we present the genes ranking with kernel maximum mean discrepancy (MMD) score and analysis the gene expression difference between different issue types
Summary
The early diagnosis of lung cancer has been a critical problem in clinical practice for a long time and identifying differentially expressed gene as disease marker is a promising solution. Researchers have explored to identify efficient biomarkers from these molecules as the indicator of the pathogenic process to improve the diagnosis sensitivity [11]. These explorations are mainly focused on genetic mutations, DNA methylation profile, miRNA synthesis profile and especially blood proteins [12,13,14,15,16,17,18,19]. As some genes have distinct expression levels between normal and tumor tissues for the reason of disease development, they are promising to diagnose lung cancer more timely and accurately
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.