Abstract

Using nonmonotone ranking functions in top-N queries is a challenge. Traditional techniques for top-N queries are based on clean data without entity resolution (ER). For dirty datasets with duplicate tuples referring to the same real-world entity, these techniques may yield top-N tuples duplicates for a query. Consequently, the effective size of the result set of the query is less than N, and some useful tuples may fail to be retrieved from the datasets, which leads to poor effectiveness. Using an ER-Index based on a divide-and-conquer mechanism and nonmonotone ranking functions, in this paper, we propose a method for processing top-N join queries with real-time ER. This method integrates ER with the processing of a top-N join query over dirty datasets on the fly. Extensive experiments are conducted to measure the effectiveness and efficiency of the method over dirty datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.