Abstract

Analyzing the biological data by considering the molecule interactions may induce a more accurate identification of disease-related biomarkers. In this study, a novel feature selection method based on molecule (feature) interactive effect network is proposed, denoted as Distance Correlation Gain-Network (DCG-Net). In DCG-Net, DCG is defined to measure the interactive effects between pairwise features with respect to the process of physiological and pathological changes and infer the molecule interactive effect network. DCG index is suitable for discrete random variables and continuous random variables. Then a greedy searching strategy is developed to search the informational modules of the interactive features with high statistical dependence on disease outcome. To evaluate the performance of DCG-Net, it was compared with eight representative feature selection techniques including t-test, ReliefF, SVM-RFE, mRMR, IG-RFE, INDEED, MN-PCC and Dcor-SFS on ten public datasets. The experiment results showed the superior performance of DCG-Net in classification accuracy rate, sensitivity, and specificity for three different classifiers. Subsequently, DCG-Net was employed to analyze a lung adenocarcinoma metabolomics dataset, and the metabolites selected involved in the important pathway and had a better discrimination ability. The experiments demonstrate that DCG can effectively detect the molecular interactions, and incorporation of the molecule interactions is helpful to identify informational biomarkers reflecting the occurrence and development of complex diseases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call