Text and Data Quality Mining in CRIS

Otmane Azeroual

doi:10.3390/info10120374

Abstract

To provide scientific institutions with comprehensive and well-maintained documentation of their research information in a current research information system (CRIS), they have the best prerequisites for the implementation of text and data mining (TDM) methods. Using TDM helps to better identify and eliminate errors, improve the process, develop the business, and make informed decisions. In addition, TDM increases understanding of the data and its context. This not only improves the quality of the data itself, but also the institution’s handling of the data and consequently the analyses. This present paper deploys TDM in CRIS to analyze, quantify, and correct the unstructured data and its quality issues. Bad data leads to increased costs or wrong decisions. Ensuring high data quality is an essential requirement when creating a CRIS project. User acceptance in a CRIS depends, among other things, on data quality. Not only is the objective data quality the decisive criterion, but also the subjective quality that the individual user assigns to the data.

Highlights

The steady growth of data and especially of textual data in the constantly expanding organizational environment leads to the necessity to integrate text and data mining (TDM) into current research information systems (CRIS)
TDM is versatile and plays a major role in the research management area, which is mainly confronted with research information in text form
Finding such research information is inefficient and time consuming over traditional search engines

Summary

Motivation

Different research institutions use research information for different purposes. Data analyses and reports based on current research information systems (CRIS) provide information about the research activities and their results. Research activities and their results at universities and academic institutions have been collected, maintained, and published via CRIS in a variety of forms and heterogeneous data sources [5]. Unstructured data presents a major challenge for CRIS administrators, especially for universities and academic institutions that manage their research information from heterogeneous data sources in CRIS [5]. The methods of TDM by means of statistical and linguistic analysis methods aim at the detection of hidden and interesting information or patterns in unstructured text documents, on the one hand to be able to process the huge amount of words and structures of the natural language, and on the other hand to allow the treatment of uncertain and fuzzy data.

Data Quality in CRIS

Definition of the Term TDM

Problems of Unstructured Data in CRIS

Employing TDM Methods in CRIS

Methods of of NLP

Application

Clustering

Formation

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Nov 28, 2019
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Text and Data Quality Mining in CRIS

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Developing Current Research Information Systems (CRIS) as Data Sources for Studies of Research
Gunnar Sivertsen
-
Gunnar SivertsenGunnar Sivertsen
01 Jan 2019
01 Jan 2019

How to Inspect and Measure Data Quality about Scientific Publications: Use Case of Wikipedia and CRIS Databases
Otmane Azeroual ... Włodzimierz Lewoniewski
Algorithms | VOL. 13
Otmane Azeroual, et. al.Otmane Azeroual ... Włodzimierz Lewoniewski
26 Apr 2020
Algorithms | VOL. 13

The changing scope of data quality and fit for purpose: evolution and adaption of a CRIS solution
Thomas Gurney
Procedia Computer Science | VOL. 211
Thomas GurneyThomas Gurney
01 Jan 2021
Procedia Computer Science | VOL. 211

2 - Current research information systems and institutional repositories: From data ingestion to convergence and merger
Joachim Schöpfel ... Otmane Azeroual
Future Directions in Digital Information | VOL. -
Joachim Schöpfel, et. al.Joachim Schöpfel ... Otmane Azeroual
30 Oct 2020
Future Directions in Digital Information | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Text and Data Quality Mining in CRIS

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information