Published in last 50 years
Articles published on Bloom Filter
- Research Article
- 10.1038/s41598-025-17258-w
- Sep 24, 2025
- Scientific reports
- Mei Sun + 6 more
The agricultural product supply chain comprises numerous links, generates a substantial amount of data, and is susceptible to high data loss rates, posing significant challenges to data management and traceability. While ensuring data integrity, the high redundancy storage of blockchain increases resource consumption and limits the participation of resource-constrained nodes. To address this gap, we propose a Storage Light Node (SLN) model for the characteristics of the agricultural supply chain. This model integrates a cold/hot data classification mechanism based on identity relevance, generation time, and query frequency, and realizes the selective local storage of high-priority data. A query optimization strategy based on Bloom filters was designed, which accelerated the speed of data retrieval. Experiments using 50,563 real records from the agricultural product supply chain show that compared with the full node, SLN reduces the storage usage by 95.10%, and the average query time is 30.91 ms, which is much faster than the traditional light node. This model provides a scalable and efficient solution for blockchain-based agricultural traceability.
- Research Article
- 10.7717/peerj-cs.3166
- Sep 8, 2025
- PeerJ Computer Science
- Song Luo + 3 more
The rapid development of the Internet of Things technology has led to a boom in the adoption of intelligent healthcare management systems in the healthcare industry. However, it has also highlighted key issues such as security, privacy, and efficient query of medical data. Traditional methods for querying medical data suffer from severe data leakage risks, low query performance, and excessive storage space. This article proposes a comprehensive Secure ENcrypted Search for Health Scheme (SENSH) solution based on consortium blockchain and searchable encryption to address these challenges. SENSH enables efficient authorization management through Bloom filters, ensuring fast querying of large datasets by authorized users while saving storage space. It uses off-chain Advanced Encryption Standard (AES) and on-chain storage management for data protection, significantly reducing the likelihood of data exposure. The system is also enhanced with event triggering and logging mechanisms to support real-time monitoring and data tracing to meet audit compliance requirements. It provides version control and timestamping to accommodate dynamic data updates, employs an obfuscationfactor to prevent tag-based original data content leakage, and supports dynamic updating of tags to accommodate different access requirements. Experimental results show that SENSH excels in authorization management, privacy protection, defense against tampering, and anti-replay and Distributed Denial of Service (DDoS). Compared with existing schemes, SENSH has significant advantages in terms of gas consumption, computation cost, and execution time. It is particularly suited for the protection and efficient query of medical and health data.
- Research Article
- 10.1101/2025.08.25.672238
- Aug 30, 2025
- bioRxiv
- Fabio Cumbo + 1 more
Metagenomics has become a powerful tool for studying microbial communities, allowing researchers to investigate microbial diversity within complex environmental samples. Recent advances in sequencing technology have enabled the recovery of near-complete microbial genomes directly from metagenomic samples, also known as metagenome-assembled genomes (MAGs). However, accurately characterizing these genomes remains a significant challenge due to the presence of sequencing errors, incomplete assembly, and contamination.Here we present MetaSBT, a new tool for organizing, indexing, and characterizing microbial reference genomes and MAGs. It is able to identify clusters of genomes at all seven taxonomic levels, from the kingdom all the way down to the species level, using the Sequence Bloom Tree (SBT) data structure that relies on Bloom Filters (BFs) to index massive amounts of genomes based on their k-mers composition. We have built an initial set of databases composed of over 190 thousand viral genomes from NCBI GenBank and public sources grouped into sequence consistent clusters at different taxonomic levels, making it the first software solution for the classification of viruses at different ranks, including still unknown ones. This results in the definition of over 40 thousand species clusters where ~80% do not match with any known viral species in reference databases to date. Furthermore, we show how our databases can be used as a new basis for existing quantitative metagenomic profilers to unlock the detection of unknown microbes and the estimation of their abundance in metagenomic samples. Finally, the framework is released open-source and, along with its public databases, is fully integrated into the Galaxy Platform enabling broad accessibility.
- Research Article
- 10.3390/s25164955
- Aug 10, 2025
- Sensors (Basel, Switzerland)
- Yun Feng + 4 more
With the rapid evolution of smart grids, secure and efficient data communication among hierarchical sensor devices has become critical to ensure privacy and system integrity. However, existing protocols often fail to balance security strength and resource constraints of terminal sensors. In this paper, we propose a novel identity-based secure data communication protocol tailored for hierarchical sensor groups in smart grid environments. The protocol integrates symmetric and asymmetric encryption to enable secure and efficient data sharing. To reduce computational overhead, a Bloom filter is employed for lightweight identity encoding, and a cloud-assisted pre-authentication mechanism is introduced to enhance access efficiency. Furthermore, we design a dynamic group key update scheme with minimal operations to maintain forward and backward security in evolving sensor networks. Security analysis proves that the protocol is resistant to replay and impersonation attacks, while experimental results demonstrate significant improvements in computational and communication efficiency compared to state-of-the-art methods—achieving reductions of 73.94% in authentication computation cost, 37.77% in encryption, and 55.75% in decryption, along with a 79.98% decrease in communication overhead during authentication.
- Research Article
- 10.26599/tst.2024.9010160
- Aug 1, 2025
- Tsinghua Science and Technology
- Jigang Wen + 4 more
Extensible Bloom Filters: Adaptive Strategies for Scalability and Efficiency in Network and Distributed Systems to Handle Increased Data
- Research Article
- 10.1016/j.molp.2025.08.001
- Aug 1, 2025
- Molecular plant
- Ze-Zhen Du + 5 more
Varigraph: An accurate and widely applicable pangenome graph-based variant genotyper for diploid and polyploid genomes.
- Research Article
- 10.1111/1755-0998.70001
- Jul 21, 2025
- Molecular Ecology Resources
- Lucas Elliott + 4 more
ABSTRACTInferring community composition from shotgun sequencing of environmental DNA is highly dependent on the completeness of reference databases used to assign taxonomic information as well as the pipeline used. While the number of complete, fully assembled reference genomes is increasing rapidly, their taxonomic coverage is generally too sparse to use them to build complete reference databases that span all or most of the target taxa. Low‐coverage, whole genome sequencing, or skimming, provides a cost‐effective and scalable alternative source of genome‐wide information in the interim. Without enough coverage to assemble large contigs of nuclear DNA, much of the utility of a genome skim in the context of taxonomic annotation is found in its short read form. However, previous methods have not been able to fully leverage the data in this format. We demonstrate the utility of wholeskim, a pipeline for the indexing of k‐mers present in genome skims and subsequent querying of these indices with short DNA reads. Wholeskim expands on the functionality of kmindex, a software which utilises Bloom filters to efficiently index and query billions of k‐mers. Using a collection of thousands of plant genome skims, wholeskim is the only software that is able to index and query the skims in their unassembled form. It is able to correctly annotate 1.16× more simulated reads and 2.48× more true sedaDNA reads in 0.32× of the time required by Holi, another metagenomic pipeline that uses genome skims in their assembled form as its reference database input. We also explore the effects of taxonomic and genomic completeness of the reference database on the accuracy and sensitivity of read assignment. Increasing the genomic coverage of the genome skims used as reference increases the number of correctly annotated reads, but with diminishing returns after ~1× depth of coverage. Increasing taxonomic coverage clearly reduces the number of false negative taxa in the dataset, but we also demonstrate that it does not greatly impact false positive annotations.
- Research Article
- 10.3390/electronics14142795
- Jul 11, 2025
- Electronics
- Emil Bureacă + 4 more
The use of Verifiable Credentials under the new eIDAS Regulation introduces privacy concerns, particularly during revocation status checks. This paper proposes a privacy-preserving revocation mechanism tailored to the European Digital Identity Wallet and its Architecture and Reference Framework. Our method publishes a daily randomized revocation list as a cascaded Bloom filter, enhancing unlinkability by randomizing revocation indexes derived from ARF guidelines. The implementation extends open-source components developed by the European Committee. This work demonstrates a practical, privacy-centric approach to revocation in digital identity systems, supporting the advancement of privacy-preserving technologies.
- Research Article
- 10.3390/electronics14132679
- Jul 2, 2025
- Electronics
- Aomen Zhao + 1 more
In recent years, Electronic Medical Record (EMR) sharing has played an indispensable role in optimizing clinical treatment plans, advancing medical research in biomedical science. However, existing EMR management schemes often face security risks and suffer from inefficient search performance. To address these issues, this paper proposes a secure EMR sharing scheme based on blockchain and searchable encryption. This scheme implements a decentralized management system with enhanced security and operational efficiency. Considering the scenario of EMRs requiring confirmation of multiple doctors to improve safety, the proposed solution leverages Shamir’s Secret Sharing to enable multi-party authorization, thereby enhancing privacy protection. Meanwhile, the scheme utilizes Bloom filter and vector operation to achieve efficient data search. The proposed method maintains rigorous EMR protection while improving the search efficiency of EMRs. Experimental results demonstrate that, compared to existing methodologies, the proposed scheme enhances security during EMR sharing processes. It achieves higher efficiency in index generation and trapdoor generation while reducing keyword search time. This scheme provides reliable technical support for the development of intelligent healthcare systems.
- Research Article
- 10.3390/asi8040092
- Jun 28, 2025
- Applied System Innovation
- Shumin Han + 5 more
Privacy-preserving record linkage (PPRL) aims to link records from different data sources while ensuring sensitive information is not disclosed. Utilizing blockchain as a trusted third party is an effective strategy for enhancing transparency and auditability in PPRL. However, to ensure data privacy during computation, such approaches often require computationally intensive cryptographic techniques. This can introduce significant computational overhead, limiting the method’s efficiency and scalability. To address this performance bottleneck, we combine blockchain with the distributed computation of secret sharing to propose a PPRL method based on blockchain-coordinated distributed computation. At its core, the approach utilizes Bloom filters to encode data and employs Boolean and arithmetic secret sharing to decompose the data into secret shares, which are uploaded to the InterPlanetary File System (IPFS). Combined with masking and random permutation mechanisms, it enhances privacy protection. Computing nodes perform similarity calculations locally, interacting with IPFS only a limited number of times, effectively reducing communication overhead. Furthermore, blockchain manages the entire computation process through smart contracts, ensuring transparency and correctness of the computation, achieving efficient and secure record linkage. Experimental results demonstrate that this method effectively safeguards data privacy while exhibiting high linkage quality and scalability.
- Research Article
- 10.1142/s0219649225500522
- Jun 27, 2025
- Journal of Information & Knowledge Management
- Yargholi Fattane + 3 more
There is a huge amount of information, made accessible by exponential growth of the number of internet users and web content, which creates the challenge of efficient and accurate information retrieval. We propose a novel approach for improving search engine accuracy and speed by applying Q-learning algorithm. The proposed method optimises the retrieval of relevant web pages by leveraging of bloom filter and deep reinforcement learning with novel classification. The Jaccard criterion is used to evaluate the similarity of datasets of the web page characteristics and to classify and label them. This comparison measure maps the results into binary classes and then feed to Q-learning algorithm for further processing. Our technique improves search speed, memory consumption, and accuracy over existing techniques. This work offers insights into the theoretical development of large scale web data retrieval, which is a critical aspect of search engine optimisation and modern web information exploration.
- Research Article
- 10.61710/kjcs.v3i2.96
- Jun 25, 2025
- AlKadhim Journal for Computer Science
- Murtadha Alazzawi + 2 more
Security and privacy must be taken into account for vehicular ad-hoc networks (VANETs) due to the fact that broadcasting occurs through an open communication channel. This work offers a Lightweight Verification with Privacy-Preserving Authentication (LV2PA) approach for vehicular communications to overcome these challenges. To satisfy security and privacy requirements, the proposed LV2PA approach employs not only the cryptographic hash function, but also a Bloom filter and the Chinese remainder theorem. During the mutual authentication of the LV2PA scheme, only the first roadside unit (RSU) and on-board unit are required to communicate with a trusted authority (TA) due to the changeover use, however the other RSUs in vehicular communications do not require TA communication. Consequently, bottleneck problems for the TA are avoided. In addition, the RSU updates the shared group key whenever a vehicle joins or departs the group; hence, the proposed LV2PA provides complete forward secrecy and backward secrecy for vehicular communications. The formal (Burrows–Abadi–Needham (BAN) logic) and informal security analyses demonstrate that the proposed LV2PA scheme is legitimate and meets the security and privacy requirements, respectively. In terms of computing and communication expenses, the performance evaluation of the proposed LV2PA scheme has advantageously low overhead and low latency compared to state-of-the-art schemes
- Research Article
- 10.1002/cpe.70148
- Jun 24, 2025
- Concurrency and Computation: Practice and Experience
- Keyu Wang + 4 more
ABSTRACTIn modern solid‐state drives (SSDs), managing hot and cold data is key to improving performance and extending lifespan. However, previous Bloom filter‐based methods were often complex and lacked solid empirical validation under high‐performance conditions. To address this, we propose a more efficient multilevel Bloom filter classification strategy tailored to SSDs. Our approach optimizes classification accuracy while reducing computational and storage overhead through careful parameter tuning. By utilizing SSD block‐level parallelism to group similar data access patterns, we enhance garbage collection efficiency and extend block life. Unlike previous studies relying on simulations, we validate our method on real SSD hardware. Experimental results show that our strategy improves SSD performance and endurance, offering valuable insights for future firmware optimization.
- Research Article
- 10.55041/ijsrem50960
- Jun 23, 2025
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
- Mary Sharmila T
Fraud as a threat to the telecom business is very real amid escalating and hopeless problem. Many telecoms also have different sets of measures that they employ against fraud. The prevention of fraud requires information and it is through this premise that detection comes in. There is also the lack of privacy rights for the exchange of information as people’s privacy becomes restricted. There are a variety of techniques defined to ensure that data sharing can also include PPRL considerations. Current works are effective due to the reasons that many of the PPRL techniques adopt a similarity measure which could be Jacquard similarity on relevant datasets. As it has been mentioned earlier, the Bloom filter implementation is applied in a number of complex and slow algorithms; however, the given technique is critically sensitive to the cryptanalysis attacks. Based on current telco infrastructure and without having to go through a multistep protocol mechanism, this paper introduces a PPRL way which is indeed invulnerable to attacks. Starting with a fresh approach to matching non-homologous datasets is the application of the new attack-proof Digital signature system like the Edwards curve. Note that what is capability of this approach is only to estimate the Jaccard similarity here, which the datasets cannot be used. It’s also important to note that basic request-response model is implemented for two-partner interactions. The performance, privacy, and matching accuracy of the approach have been evaluated by employing a scale of 1 to 5 using a large set of data set public. Concrete against attacks, this technique provides an enhanced pace and attains perfect match consistency. Key Words: Telecom fraud, PPRL (Privacy-Preserving Record Linkage), Information exchange, Jaccard similarity, Bloom filter, Digital signature system (Edwards curve)
- Research Article
- 10.1145/3725327
- Jun 17, 2025
- Proceedings of the ACM on Management of Data
- Zichen Zhu + 3 more
Log-structured merge (LSM) trees typically employ Bloom Filters (BFs) to prevent unnecessary disk accesses for point queries. The size of BFs can be tuned to navigate a memory vs. performance tradeoff. State-of-the-art memory allocation strategies use a worst-case model for point lookup cost to derive a closed-form solution. However, existing approaches have three limitations: (1) the number of key-value pairs to be ingested must be known a priori , (2) the closed-form solution only works for a perfectly shaped LSM tree, and (3) the model assumes a uniform query distribution . Due to these limitations, the available memory budget for BFs is sub-optimally utilized, especially when the system is under memory pressure (i.e., less than 7 bits per key). In this paper, we design Mnemosyne, a BF reallocation framework for evolving LSM trees that does not require prior workload knowledge. We use a more general query cost model that considers the access pattern per file , and we find that no system accurately maintains access statistics per file, and that simply maintaining a counter per file significantly deviates from the ground truth for evolving LSM trees. To address this, we propose Merlin, a dynamic sliding-window-based tracking mechanism that accurately captures these statistics. The upgraded Mnemosyne^+ combines Merlin with our new cost model. In our evaluation, Mnemosyne reduces query latency by up to 20% compared to RocksDB under memory pressure, and Mnemosyne^+ further improves throughput by another 10% when workloads exhibit higher skew.
- Research Article
- 10.1145/3725259
- Jun 17, 2025
- Proceedings of the ACM on Management of Data
- Yifei Yang + 1 more
Join is one of the most critical operators in query processing. One effective way to optimize multi-join performance is to pre-filter rows that do not contribute to the query output. Techniques that reflect this principle include predicate pushdown, semi-join, Yannakakis algorithm, Bloom join, predicate transfer, etc. Among these, predicate transfer is the state-of-the-art pre-filtering technique that removes most non-contributing rows through a series of Bloom filters thereby delivering significant performance improvement. However, the existing predicate transfer technique has several limitations. First, the current algorithm works only on a single-threaded system while real analytics databases for large workloads are typically distributed across multiple nodes. Second, some predicate transfer steps may not filter out any rows in the destination table thus introduces performance overhead with no speedup. This issue is exacerbated in a distributed environment, where unnecessary predicate transfers lead to extra network latency and traffic. In this paper, we aim to address both limitations. First, we explore the design space of distributed predicate transfer and propose cost-based adaptive execution to maximize the performance for each individual transfer step. Second, we develop a pruning algorithm to effectively remove unnecessary transfers that do not have positive contribution to performance. We implement both techniques and evaluate on a distributed analytics query engine. Results on standard OLAP benchmarks including TPC-H and DSB with a scale factor up to 400 show that distributed predicate transfer can improve the query performance by over 3×, and reduce the amount of data exchange by over 2.7×.
- Research Article
- 10.1002/cpe.70149
- Jun 6, 2025
- Concurrency and Computation: Practice and Experience
- Fengyi Gao + 5 more
ABSTRACTSearchable Encryption (SE) enables searching over encrypted data. Exact keyword search is supported in most SE schemes, which achieve higher search accuracy but suffer from lower completeness due to the inability to handle similar expressions. To realize fuzzy keyword search, some schemes employ Bloom Filters (BFs), but these may incur high false positive rates and risk exposing the Bloom Filter's internal values to cloud servers (CS). Besides, most existing schemes ignore the fact that CS may engage in malicious behaviors (e.g., undercounting parameters or forging results). To address these issues, we propose an efficient and verifiable ranked fuzzy multi‐keyword search scheme based on BFs. We propose a Twin Bloom Filter (TBF) to conceal insertion positions and introduce random numbers to obfuscate uninserted bits. Search results are ranked using Term Frequency‐Inverse Document Frequency (TF‐IDF) scores to improve relevance. To ensure correctness and integrity, we employ Real Homomorphic Message Authentication Codes (RealHomMAC) and a random challenge technique, respectively. Security analysis proves that our scheme remains secure under both the known‐ciphertext model and the known‐background model. Theoretical and experimental performance analysis confirms that our scheme achieves efficient and accurate keyword search.
- Research Article
1
- 10.1016/j.hcc.2024.100265
- Jun 1, 2025
- High-Confidence Computing
- Chang Liu + 5 more
Trusted access control mechanism for data with blockchain-assisted attribute encryption
- Research Article
- 10.1016/j.dcan.2024.10.013
- Jun 1, 2025
- Digital Communications and Networks
- Xinzhe Huang + 5 more
Dynamically redactable blockchain based on decentralized Chameleon hash
- Research Article
- 10.14210/cotb.v16.p502-508
- May 27, 2025
- Anais do Computer on the Beach
- Fernando De Barros Castro + 3 more
Abstract Most Host Intrusion Detection Systems (HIDS) do not count withmemory evaluation capabilities, being that the reason why theyare not able to detect Advanced Volatile Threats, such as FilelessMalware, which can only be detected via memory scans. In thiswork, we propose HardCore, a signature based HIDS in hardware,able to perform signature matching with no overhead to the endpointsystem. HardCore receives as input, at every memory write, acache line, outputting any matches against known fileless malwarepresent in the signature base. HardCore uses bloom filters and malwareclustering to operate, having its signature matrices divided toobtain the gains originating from those implementation choices.