Abstract

Objective: To study the Entity Resolution (ER) using Qualified Security Assessor (QSA) to minimize the pre-processing steps which are required to fetch the data a given Structured Query Language (SQL)-like selection correctly. Methods/ Statistical Analysis: In recent times, the problem of Entity Resolution is carried out in the context of data warehousing as an offline pre-processing step prior to the making data accessible to analysis – an approach that works well under paradigm settings. Such an offline approach, however, is not possible in budding applications that requires analyzing only small portions of the whole dataset and generating answers in (near) real-time. In this we presented an approach named as (QSA) to minimizing the data pre-processing steps in query processing to detect the objects. Findings: To test the efficiency of the QSA we collected the bibliographic data from the Google Scholar. This data contained the top 50 computer related researchers each having h-index of 60 or higher. The dataset contained 16, 396 records where 14.3% are duplicates. Then we apply the QSA using two blocking function to group the data records that might be duplicates together. Finally, we ensure pair wise resolve function to detect weather two records represent the same real world entity or not accurately. Applications/Improvements: We have used the semantics of such a collection predicate to decrease pre-processing in the Entity Resolution. The accuracy of the entity resolution using QSA will be increased. Keywords: Data Wareho use, Entity Resolution, Offline Pre-processing, Query Processing, Rudimentary

Highlights

  • This study deals with the problem of Entity Resolution (ER) challenge

  • In our initial work[2], we developed Qualified Security Assessor (QSA) to work with keen clustering techniques[6]

  • We extend QSA to enforce with eager clustering techniques

Read more

Summary

Introduction

Entity resolution brings out in the background of data warehousing as an offline pre-processing step earlier to building data available to analysis – an approach that facilitates well under idea settings. Such an offline approach is not possible in growing applications that need analyzing only tiny portions of the complete data and the producing answers in real-time[1]. Outlines wherein a little establishment contains an expansive dataset, yet require analyze just small segments of it to answer some efficient inquiries instantly. It would be pointless for that organization to use their insufficient accessible assets on pre-handling absolute information, for the most part given that the vast majority of it will be copy

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call