Billion-Scale Similarity Search Using a Hybrid Indexing Approach with Advanced Filtering

Simeon Emanuilov,Aleksandar Dimov

doi:10.2478/cait-2024-0035

Billion-Scale Similarity Search Using a Hybrid Indexing Approach with Advanced Filtering

Simeon Emanuilov, Aleksandar Dimov

https://doi.org/10.2478/cait-2024-0035

Copy DOI

Export

Save

Cite

Journal: Cybernetics and Information Technologies	Publication Date: Dec 1, 2024
License type: CC BY-NC-ND 4.0

#Approach For Similarity Search #Similarity Search #High-dimensional Spaces #Large-scale Search #Approach For Search #Classical Index #Classical Structure #Case Study #Large-scale Similarity Search #CPU-based Systems

Abstract
Full-Text
Similar Papers

Abstract

Listen

Abstract This paper presents a novel approach for similarity search with complex filtering capabilities on billion-scale datasets, optimized for CPU inference. Our method extends the classical IVF-Flat index structure to integrate multi-dimensional filters. The proposed algorithm combines dense embeddings with discrete filtering attributes, enabling fast retrieval in high-dimensional spaces. Designed specifically for CPU-based systems, our disk-based approach offers a cost-effective solution for large-scale similarity search. We demonstrate the effectiveness of our method through a case study, showcasing its potential for various practical uses.

Full Text

Published Version

Check institute access

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Cybernetics and Information Technologies

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.

R Discovery Prime

Billion-Scale Similarity Search Using a Hybrid Indexing Approach with Advanced Filtering