Abstract
Mikhail S. Dvoretckii - MSc Program Student, Lomonosov Moscow State University; Programmer, IQ Systems LLCAddress: 1, Leninskie Gory, Moscow, 119991, Russian FederationE-mail: mike.dvorecky@gmail.com An important way of ensuring data quality is controlling data input. One of the methods of doing that is checking the input data against the corresponding reference data where applicable. This may be done via autocomplete. Since reference data is usually stored in a centralized fashion, autocomplete algorithms usually run in client-server architectures and face strict time requirements. In this article, a new autocomplete task decomposition is formulated using an existing method based on range minimum queries (RMQ). The Top-k RMQ problem is formulated and used in the autocomplete problem decomposition. A segment tree based algorithm is proposed for the Top-k RMQ problem. While the conventional segment tree based RMQ algorithm when used in autocomplete (in the Top-k RMQ sub-problem) repeatedly processes the same nodes on the tree, the proposed algorithm is adapted directly to the Top-k RMQ problem and does not require any node of the segment tree to be processed more than twice. A complexity analysis is made for both the new Top-k RMQ algorithm and the conventional segment tree-based RMQ approach. This analysis considers different implementations of priority queues used in these algorithms, specifically binary heaps and ordered arrays. The new algorithm has time complexity that is not lower than that of the conventional algorithms with any priority queue implementation. To prove the practical value of the new algorithm, a series of experiments was conducted using the data from the All-Russian Classifier of Addresses - a practical source of reference data for Russian address inputs. The new algorithm demonstrates better time efficiency than the conventional one in all experiments with all priority queue implementations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.