Abstract
This study aimed to select the feature genes of hepatocellular carcinoma (HCC) with the Fisher score algorithm and to identify hub genes with the Maximal Clique Centrality (MCC) algorithm. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was performed to examine the enrichment of terms. Gene set enrichment analysis (GSEA) was used to identify the classes of genes that are overrepresented. Following the construction of a protein-protein interaction network with the feature genes, hub genes were identified with the MCC algorithm. The Kaplan–Meier plotter was utilized to assess the prognosis of patients based on expression of the hub genes. The feature genes were closely associated with cancer and the cell cycle, as revealed by GO, KEGG and GSEA enrichment analyses. Survival analysis showed that the overexpression of the Fisher score–selected hub genes was associated with decreased survival time (P < 0.05). Weighted gene co-expression network analysis (WGCNA), Lasso, ReliefF and random forest were used for comparison with the Fisher score algorithm. The comparison among these approaches showed that the Fisher score algorithm is superior to the Lasso and ReliefF algorithms in terms of hub gene identification and has similar performance to the WGCNA and random forest algorithms. Our results demonstrated that the Fisher score followed by the application of the MCC algorithm can accurately identify hub genes in HCC.
Highlights
Gene microarray technology, a prospective tool for the classification, diagnosis and aggressiveness prediction of cancer, provides valuable information in understanding the underlying mechanism of multiple cancers[1,2,3,4]
Unlike some other feature selection algorithms, such as principal component analysis (PCA), in which the selected features are a combination of some raw features, the Fisher score algorithm selects each gene independently based on their scores under the Fisher criterion, which eventually leads to a subset of the most representative individual genes[25,26]
Most hepatocellular carcinoma (HCC) cases are detected in advanced stages with the invasion of major www.nature.com/scientificreports blood vessels, obvious extrahepatic metastases or poor liver function, making them unfit for surgical resection
Summary
A prospective tool for the classification, diagnosis and aggressiveness prediction of cancer, provides valuable information in understanding the underlying mechanism of multiple cancers[1,2,3,4]. Unlike some other feature selection algorithms, such as principal component analysis (PCA), in which the selected features are a combination of some raw features, the Fisher score algorithm selects each gene independently based on their scores under the Fisher criterion, which eventually leads to a subset of the most representative individual genes[25,26] This algorithm may be an appropriate method for the feature selection of high dimensional gene expression profile data. To further evaluate the performance of the Fisher approach, weighted gene co-expression network analysis (WGCNA), one of the most widely used hub gene identification approaches, along with the Lasso, ReliefF and random forest algorithms, were used as comparison algorithms
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.