A Text and Data Analytics Approach to Enrich the Quality of Unstructured Research Information

Otmane Azeroual

doi:10.5539/cis.v12n4p84

Abstract

With the increased accessibility of research information, the demands on research information systems (RIS) that are expected to automatically generate and process knowledge are increasing. Furthermore, the quality of the RIS data entries of the individual sources of information causes problems. If the data is structured in RIS, users can read and filter out their information and knowledge needs without any problems. This technique, which nevertheless allows text databases and text sources to be analyzed and knowledge extracted from unknown texts, is referred to as text mining or text data mining based on the principles of data mining. Text mining allows automatically classifying large heterogeneous sources of research information and assigning them to specific topics. Research information has always played a major role in higher education and academic institutions, although they were usually available in unstructured form in RIS and grow faster than structured data. This can be a waste of time searching for RIS staff in universities and can lead to bad decision-making. For this reason, the present paper proposes a new approach to obtaining structured research information from heterogeneous information systems. It is a subset of an approach to the semantic integration of unstructured data using the example of a RIS. The purpose of this paper is to investigate text and data mining methods in the context of RIS and to develop an improvement quality model as an aid to RIS using universities and academic institutions to enrich unstructured research information.

Highlights

The flood of research information that reaches every research data manager is steadily increasing
It is important to note that this paper refers to the existing and widely used text and data mining methods (e.g. Natural Language Processing (NLP), information extraction, document classification by clustering) and focused on their application in research information systems (RIS), as these are often discussed and considered in other fields or information systems in the literature
In the context of RIS, the following methods can be used in the pre-processing steps: Natural Language Processing (NLP), information extraction, document classification by clustering

Summary

Introduction

The flood of research information that reaches every research data manager is steadily increasing. Information on research activities and results in universities and academic institutions in a variety of forms and heterogeneous data sources has been collected, maintained and published through RIS. These are mostly unstructured in various forms and media (Azeroual, et al 2018). After describing the problems of unstructured research information during their acquisition and integration into the RIS, the aim of the paper is to investigate the potentials of using text and data mining methods in the context of RIS and to propose a framework as an aid to RIS users to transform the text sources into structured environments.

Background

Uses of Text and Data Mining Methods in RIS

Grouping of data objects or document representations

Move all documents into the most similar clusters

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer and Information Science	Publication Date: Oct 30, 2019
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Text and Data Analytics Approach to Enrich the Quality of Unstructured Research Information

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer and Information Science

Lead the way for us

Similar Papers

Editorial: There Are Promises to Keep and Miles to Go Before I Leave…
Alok Gupta
Information Systems Research | VOL. 33
Alok GuptaAlok Gupta
01 Dec 2022
Information Systems Research | VOL. 33

Focus on Authors
-
Marketing Science | VOL. 31
--
01 May 2012
Marketing Science | VOL. 31

About Our Authors
-
Information Systems Research | VOL. 23
--
01 Jun 2012
Information Systems Research | VOL. 23

Large Language Models for Conducting Advanced Text Analytics Information Systems Research
Benjamin Ampel ... James Hu
ACM Transactions on Management Information Systems | VOL. -
Benjamin Ampel, et. al.Benjamin Ampel ... James Hu
26 Jul 2024
ACM Transactions on Management Information Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Text and Data Analytics Approach to Enrich the Quality of Unstructured Research Information

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer and Information Science