Proposed threshold-based and rule-based approaches to detecting duplicates in bibliographic database

M Miftakul Amin,Deris Stiawan,Ermatita Ermatita,Rahmat Budiarto

doi:10.11591/eei.v13i3.7665

Abstract

Bibliographic databases are used to measure the performance of researchers, universities and research institutions. Thus, high data quality is required and data duplication is avoided. One of the weaknesses of the threshold-based approach in duplication detection is the low accuracy level. Therefore, another approach is required to improve duplication detection. This study proposes a method that combines threshold-based and rule-based approaches to perform duplication detection. These two approaches are implemented in the comparison stage. The cosine similarity function is used to create weight vectors from the features. Then, the comparison operator is used to determine whether the pair of records are grouped as duplication or not. Three research databases: Web of Science (WoS), Scopus, and Google Scholar (GS) on the Science and Technology Index (SINTA) database are investigated. Rule 4 and Rule 5 provide the best performance. For WoS dataset, the accuracy, precision, recall, and F1-measure values were 100.00%. For Scopus dataset, the accuracy and precision values were 100.00%, recall: 98.00%, and the F1-measure value is 98.00%. For GS dataset, the accuracy value was 100.00%, precision: 99.00%, recall: 97.00%, and the F1-measure value is 98.00%. The proposed method is potential tool for accurate detection on duplication records in publication databases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Proposed threshold-based and rule-based approaches to detecting duplicates in bibliographic database

Abstract

Talk to us

Similar Papers

More From: Bulletin of Electrical Engineering and Informatics

Lead the way for us

Journal: Bulletin of Electrical Engineering and Informatics	Publication Date: Jun 1, 2024
License type: CC BY-SA 4.0

Similar Papers

Hybrid Fuzzy Jordan Network for Robust and Efficient Intrusion Detection System
A Dhivya ... S N Sivanandan
Indian Journal of Science and Technology | VOL. 8
A Dhivya, et. al.A Dhivya ... S N Sivanandan
31 Dec 2015
Indian Journal of Science and Technology | VOL. 8

Variation in Results of Three Biology‐Focused Search Engines: A Case Study Using North American Tree Species
Pete Bettinger ... Krista Merry
The Bulletin of the Ecological Society of America | VOL. 102
Pete Bettinger, et. al.Pete Bettinger ... Krista Merry
09 Nov 2020
The Bulletin of the Ecological Society of America | VOL. 102

Boosting-Based Relevance Feedback for CBIR
Jasman Pardede ... Masayu Leylia Khodra
-
Jasman Pardede, et. al.Jasman Pardede ... Masayu Leylia Khodra
01 Nov 2018
01 Nov 2018

CLASSIFICATION OF CUSTOMERS EMOTION USING NAÏVE BAYES CLASSIFIER (Case Study: Natasha Skin Care)
...
-
, et. al. ...
21 Feb 2018
21 Feb 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Proposed threshold-based and rule-based approaches to detecting duplicates in bibliographic database

Abstract

Talk to us

Similar Papers

More From: Bulletin of Electrical Engineering and Informatics