Abstract

• Preferential attachments are prevalent in knowledge network growth. This article analyzes temporal co-occurrences of author selected keywords in scientific literature to support emerging Literature Based Discovery (LBD) by viewing the process as supervised learning problem. • By mining temporal evolution of keyword co-occurrences networks (KCN) and prominence of keywords over time with regards to their edge formation, network neighborhood, this article defined genealogical communities of keywords. • To predict the future co-evolution of author selected keywords in scientific literature, the feature construction process analyzed both bipartite network (keywords-authors and keywords- articles) and normalized unipartite network (keywords-keywords) including relative importance of the citation counts accrued by the author selected keywords over time. • The prediction performances were compared against features extracted from homogeneous and heterogeneous networks (heterogeneous bibliographic information network) to demonstrate the competency of the constructed features in predicting future scientific hypotheses in two research domains. Literature-based discovery process identifies the important but implicit relations among information embedded in published literature. Existing techniques from Information Retrieval (IR) and Natural Language Processing (NLP) attempt to identify the hidden or unpublished connections between information concepts within published literature, however, these techniques overlooked the concept of predicting the future and emerging relations among scientific knowledge components such as author selected keywords encapsulated within the literature. Keyword Co-occurrence Network (KCN), built upon author selected keywords, is considered as a knowledge graph that focuses both on these knowledge components and knowledge structure of a scientific domain by examining the relationships between knowledge entities. Using data from two multidisciplinary research domains other than the bio-medical domain, and capitalizing on bibliometrics, the dynamicity of temporal KCNs, and a recurrent neural network, this study develops some novel features supportive for the prediction of the future literature-based discoveries - the emerging connections (co-appearances in the same article) among keywords. Temporal importance extracted from both bipartite and unipartite networks, communities defined by genealogical relations, and the relative importance of temporal citation counts were used in the feature construction process. Both node and edge-level features were input into a recurrent neural network to forecast the feature values and predict the future relations between different scientific concepts/topics represented by the author selected keywords. High performance rates, compared both against contemporary heterogeneous network-based method and preferential attachment process, suggest that these features complement both the prediction of future literature-based discoveries and emerging trend analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call