Abstract

AbstractThis chapter introduces a framework that is based on a novel graph-based text representation method and combines graph-based feature selection, text categorization and link prediction to advance the discovery of future research collaborations. Our approach integrates into a single knowledge graph both structured and unstructured textual data through a novel representation of multiple scientific documents. The Neo4j graph database is used for the representation of the proposed scientific knowledge graph. For the implementation of our approach, we use the Python programming language and the scikit-learn machine learning library. We assess our approach against classical link prediction algorithms using accuracy, recall and precision as our performance metrics. Our experiments achieve state-of-the-art accuracy in the task of predicting future research collaborations. The experimentations reported in this chapter use the COVID-19 Open Research Dataset.KeywordsLink predictionText categorizationFeature selectionKnowledge graphsNatural language processingDocument representation

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.