Abstract

Literature-based discovery (LBD) is a text mining methodology for automatically generating research hypotheses from existing knowledge. We mimic the process of LBD as a classification problem on a graph of MeSH terms. We employ unsupervised and supervised link prediction methods for predicting previously unknown connections between biomedical concepts. We evaluate the effectiveness of link prediction through a series of experiments using a MeSH network that contains the history of link formation between biomedical concepts. We performed link prediction using proximity measures, such as common neighbor (CN), Jaccard coefficient (JC), Adamic / Adar index (AA) and preferential attachment (PA). Our approach relies on the assumption that similar nodes are more likely to establish a link in the future. Applying an unsupervised approach, the AA measure achieved the best performance in terms of area under the ROC curve (AUC = 0.76), followed by CN, JC, and PA. In a supervised approach, we evaluate whether proximity measures can be combined to define a model of link formation across all four predictors. We applied various classifiers, including decision trees, k-nearest neighbors, logistic regression, multilayer perceptron, naïve Bayes, and random forests. Random forest classifier accomplishes the best performance (AUC = 0.87). The link prediction approach proved to be effective for LBD processing. Supervised statistical learning approaches clearly outperform an unsupervised approach to link prediction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.