Abstract

With the increased accessibility of research information, the demands on research information systems (RIS) that are expected to automatically generate and process knowledge are increasing. Furthermore, the quality of the RIS data entries of the individual sources of information causes problems. If the data is structured in RIS, users can read and filter out their information and knowledge needs without any problems. This technique, which nevertheless allows text databases and text sources to be analyzed and knowledge extracted from unknown texts, is referred to as text mining or text data mining based on the principles of data mining. Text mining allows automatically classifying large heterogeneous sources of research information and assigning them to specific topics. Research information has always played a major role in higher education and academic institutions, although they were usually available in unstructured form in RIS and grow faster than structured data. This can be a waste of time searching for RIS staff in universities and can lead to bad decision-making. For this reason, the present paper proposes a new approach to obtaining structured research information from heterogeneous information systems. It is a subset of an approach to the semantic integration of unstructured data using the example of a RIS. The purpose of this paper is to investigate text and data mining methods in the context of RIS and to develop an improvement quality model as an aid to RIS using universities and academic institutions to enrich unstructured research information.

Highlights

  • The flood of research information that reaches every research data manager is steadily increasing

  • It is important to note that this paper refers to the existing and widely used text and data mining methods (e.g. Natural Language Processing (NLP), information extraction, document classification by clustering) and focused on their application in research information systems (RIS), as these are often discussed and considered in other fields or information systems in the literature

  • In the context of RIS, the following methods can be used in the pre-processing steps: Natural Language Processing (NLP), information extraction, document classification by clustering

Read more

Summary

Introduction

The flood of research information that reaches every research data manager is steadily increasing. Information on research activities and results in universities and academic institutions in a variety of forms and heterogeneous data sources has been collected, maintained and published through RIS. These are mostly unstructured in various forms and media (Azeroual, et al 2018). After describing the problems of unstructured research information during their acquisition and integration into the RIS, the aim of the paper is to investigate the potentials of using text and data mining methods in the context of RIS and to propose a framework as an aid to RIS users to transform the text sources into structured environments.

Background
Uses of Text and Data Mining Methods in RIS
Grouping of data objects or document representations
Move all documents into the most similar clusters
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call