Inverted Index Research Articles

Today, real-time search over big microblogging data requires low indexing and query latency. Online services, therefore, prefer to host inverted indices in memory. Unfortunately, as datasets grow, indices grow proportionally, and with limited DRAM scaling, the main memory faces high pressure. Also, indices must be persisted on disks as building them is computationally intensive. Consequently, it becomes necessary to frequently move on-heap index segments to storage, slowing down indexing. Reading storage-resident index segments requires filesystem calls and disk accesses during query evaluation, leading to high and unpredictable tail latency. This work exploits hybrid DRAM and scalable non-volatile memory (NVM) to offer dynamically growing and instantly searchable (i.e., real-time) persistent indices in on-heap memory. We implement SPIRIT, a real-time text inversion engine over hybrid memory. SPIRIT exploits the byte-addressability of hybrid memory to enable direct access to the index on a pre-allocated heap, eliminating expensive block storage accesses and filesystem calls during live operation. It uses an in-memory segment descriptor table to offer: ① instant segment availability to query evaluators upon fresh ingestion, ② low-overhead segment movement across memory tiers transparent to query evaluators, and ③ decoupled segment movement into NVM from their visibility to query evaluators, enabling different policies for mitigating NVM latency. SPIRIT accelerates compaction with zero-copy merging. It supports volatile, graceful shutdown, and crash-consistent indexing modes. The latter two modes offer instant recovery using persistent pointers. SPIRIT with hybrid memory and strong crash consistency guarantees exhibits many orders of magnitude better tail response times and query throughout than the state-of-the-art Lucene search engine. Compared against a highly optimized non-real-time evaluation of Lucene with liberal DRAM size, on average, across six query workloads, SPIRIT still delivers 2.5 × better (real-time) query throughput. Our work applies to other services that will benefit from direct on-heap access to large persistent indices.

Read full abstract

In the age of technological advancement, collaborative E-healthcare emerges as a transformative system eliminating traditional location and accessibility barriers in healthcare services. Here, Searchable Encryption (SE) plays a key role in enabling healthcare providers to outsource encrypted medical data and search services to third parties like cloud servers, thereby reducing storage and management expenses. This intermediary approach poses challenges of single-point failure, privacy breaches, and potentially untrustworthy results. State-of-the-art public key-based SE methods use a cloud-assisted architecture that doesn’t support reliable and practical searches with fine-grained permissions. Also, such systems require additional support to address potential privacy leakages and ensure data availability at the storage server. To address these concerns, we propose a B lockchain-assisted E fficient and S ecure K eyword S earch (BESKS) scheme to enforce fine-grained keyword search privilege control while achieving practical search complexity. Our scheme employs a ciphertext policy attribute-based keyword search mechanism where keywords are encrypted using expressive access policies to build an inverted index structure. The encrypted indexes are stored on the blockchain while encrypted medical documents are stored on InterPlanetary File System (IPFS) nodes to enhance availability and ensure the reliability and scalability of our approach. Our scheme utilizes blockchain-based smart contract for efficient, secure search operations and ensures financial fairness in fine-grained searches. Search tokens are generated based on user attributes and query keywords to facilitate private searches on-chain. To enhance the search process, our secure index enables exact match for a query keyword in constant time to ensure expensive authorization operations are performed only once. Theoretical analysis suggests that our BESKS is more efficient and secure than state-of-the-art schemes. Prototype implementation results on the Ethereum blockchain network further validate its feasibility for real-world applications, demonstrating the scheme's practical applicability in collaborative E-Healthcare systems.

Read full abstract

Inverted Index Research Articles

Related Topics

Articles published on Inverted Index

SPIRIT: Scalable and Persistent In-Memory Indices for Real-Time Search

Secure Blockchain Assisted Attribute-Based Keyword Search for Collaborative E-Healthcare

Vector Databases

Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics.

Extracting Key-phrase Embedding using Deep Average Network and Maximal Marginal Relevance to Enhance Information Retrieval

Scalable Distributed Inverted List Indexes in Disaggregated Memory

From Text to Recommendations: How Vector Databases are Revolutionizing Personalized Content Delivery

Fulgor: a fast and compact k-mer index for large-scale matching and color queries.

Semi-supervised inverted file index approach for approximate nearest neighbor search

Privacy-aware document retrieval with two-level inverted indexing

Improving Performance of Searchable Symmetric Encryption Through New Information Retrieval Scheme

Fast and private multi-dimensional range search over encrypted data

String Matching Algorithm Using Multi-Characters Inverted Lists

Dynamic forward secure searchable encryption scheme with phrase search for smart healthcare

Exploring Composite Indexes for Domain Adaptation in Neural Machine Translation

HCV: Practical Multi-Keyword Conjunctive Query with Little Result Pattern Leakage

Practical and Dynamic Attribute-Based Keyword Search Supporting Numeric Comparisons Over Encrypted Cloud Data

Improving the Performance of Searchable Symmetric Encryption by Optimizing Locality

An Attribute-Based Searchable Encryption Scheme for Cloud-Assisted IIoT

ForestTI: A Scalable Inverted-Index-Oriented Timeseries Management System with Flexible Memory Efficiency

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Inverted Index Research Articles

Related Topics

Articles published on Inverted Index

SPIRIT: Scalable and Persistent In-Memory Indices for Real-Time Search

Secure Blockchain Assisted Attribute-Based Keyword Search for Collaborative E-Healthcare

Vector Databases

Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics.

Extracting Key-phrase Embedding using Deep Average Network and Maximal Marginal Relevance to Enhance Information Retrieval

Scalable Distributed Inverted List Indexes in Disaggregated Memory

From Text to Recommendations: How Vector Databases are Revolutionizing Personalized Content Delivery

Fulgor: a fast and compact k-mer index for large-scale matching and color queries.

Semi-supervised inverted file index approach for approximate nearest neighbor search

Privacy-aware document retrieval with two-level inverted indexing

Improving Performance of Searchable Symmetric Encryption Through New Information Retrieval Scheme

Fast and private multi-dimensional range search over encrypted data

String Matching Algorithm Using Multi-Characters Inverted Lists

Dynamic forward secure searchable encryption scheme with phrase search for smart healthcare

Exploring Composite Indexes for Domain Adaptation in Neural Machine Translation

HCV: Practical Multi-Keyword Conjunctive Query with Little Result Pattern Leakage

Practical and Dynamic Attribute-Based Keyword Search Supporting Numeric Comparisons Over Encrypted Cloud Data

Improving the Performance of Searchable Symmetric Encryption by Optimizing Locality

An Attribute-Based Searchable Encryption Scheme for Cloud-Assisted IIoT

ForestTI: A Scalable Inverted-Index-Oriented Timeseries Management System with Flexible Memory Efficiency