Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Hash functions in nucleotide sequence analysis.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Randomness is a powerful tool in the design and analysis of algorithms and data structures for nucleotide sequence data. Nucleotide sequences are not themselves random but are often randomized using hash functions. Despite their widespread use in genomics, there is no comprehensive review of the types of hash functions used and their various applications. In this survey intended for bioinformatic methods developers, we divide hash functions into four categories: scattering hash functions, permutations, minimum perfect hash functions, and locality-sensitive hash functions. For each category, we provide examples of both general-use hash functions that have been applied in nucleotide sequence analysis and hash functions that have been designed specifically for nucleotide sequence analysis. We highlight their salient properties, commonalities, differences, and application areas.

Similar Papers
  • Conference Article
  • Cite Count Icon 3
  • 10.1109/compsac.2011.38
Analysis of Concept Similarity Methods Applied to an LSH Function
  • Jul 1, 2011
  • Luciano B De Paula + 2 more

In literature, there are several methods to measure similarity between concepts in structures like simple ontologies, concept hierarchies, taxonomies, etc. These measures are used to search for similar concepts. In the Semantic Web, such structures are commonly used to classify data which opens the possibility of reasoning upon them and helps in conceptual searches. Besides that, the Locality Sensitive Hash (LSH) functions are used to store similar data close to each other in an index space. Each family of LSH functions is tied to a specific similarity function. In this paper we propose a method for combining the idea of conceptual similarity with LSH functions. This method permits the data classified as similar concepts be indexed close to each other respecting some metric. The main idea is to facilitate the conceptual searching for data semantically classified. This paper evaluates several methods of measuring the similarity between concepts in a simple ontology and discusses how they can be applied to an LSH function.

  • Research Article
  • Cite Count Icon 4
  • 10.5075/epfl-thesis-5333
Design and Analysis of Multi-Block-Length Hash Functions
  • Jan 1, 2012
  • Infoscience (Ecole Polytechnique Fédérale de Lausanne)
  • Onur Özen

Cryptographic hash functions are used in many cryptographic applications, and the design of provably secure hash functions (relative to various security notions) is an active area of research. Most of the currently existing hash functions use the Merkle-Damgård paradigm, where by appropriate iteration the hash function inherits its collision and preimage resistance from the underlying compression function. Compression functions can either be constructed from scratch or be built using well-known cryptographic primitives such as a blockcipher. One classic type of primitive-based compression functions is single-block-length : It contains designs that have an output size matching the output length n of the underlying primitive. The single-block-length setting is well-understood. Yet even for the optimally secure constructions, the (time) complexity of collision- and preimage-finding attacks is at most 2n/2, respectively 2n ; when n = 128 (e.g., Advanced Encryption Standard) the resulting bounds have been deemed unacceptable for current practice. As a remedy, multi-block-length primitive-based compression functions, which output more than n bits, have been proposed. This output expansion is typically achieved by calling the primitive multiple times and then combining the resulting primitive outputs in some clever way. In this thesis, we study the collision and preimage resistance of certain types of multi-call multi-block-length primitive-based compression (and the corresponding Merkle-Damgård iterated hash) functions : Our contribution is three-fold. First, we provide a novel framework for blockcipher-based compression functions that compress 3n bits to 2n bits and that use two calls to a 2n-bit key blockcipher with block-length n. We restrict ourselves to two parallel calls and analyze the sufficient conditions to obtain close-to-optimal collision resistance, either in the compression function or in the Merkle-Damgård iteration. Second, we present a new compression function h: {0,1}3n → {0,1}2n ; it uses two parallel calls to an ideal primitive (public random function) from 2n to n bits. This is similar to MDC-2 or the recently proposed MJH by Lee and Stam (CT-RSA'11). However, unlike these constructions, already in the compression function we achieve that an adversary limited (asymptotically in n) to O (22n(1-δ)/3) queries (for any δ > 0) has a disappearing advantage to find collisions. This is the first construction of this type offering collision resistance beyond 2n/2 queries. Our final contribution is the (re)analysis of the preimage and collision resistance of the Knudsen-Preneel compression functions in the setting of public random functions. Knudsen-Preneel compression functions utilize an [r,k,d] linear error-correcting code over 𝔽2e (for e > 1) to build a compression function from underlying blockciphers operating in the Davies-Meyer mode. Knudsen and Preneel show, in the complexity-theoretic setting, that finding collisions takes time at least 2(d-1)n2. Preimage resistance, however, is conjectured to be the square of the collision resistance. Our results show that both the collision resistance proof and the preimage resistance conjecture of Knudsen and Preneel are incorrect : With the exception of two of the proposed parameters, the Knudsen-Preneel compression functions do not achieve the security level they were designed for.

  • Research Article
  • Cite Count Icon 8
  • 10.14778/3626292.3626293
Cryptographically Secure Private Record Linkage using Locality-Sensitive Hashing
  • Oct 1, 2023
  • Proceedings of the VLDB Endowment
  • Ruidi Wei + 1 more

Private record linkage (PRL) is the problem of identifying pairs of records that approximately match across datasets in a secure, privacy-preserving manner. Two-party PRL specifically allows each of the parties to obtain records from the other party, only given that each record matches with one of their own. The privacy goal is that no other information about the datasets should be released than the matching records. A fundamental challenge is not to leak information while at the same time not comparing all pairs of records. In plaintext record linkage this is done using a blocking strategy, e.g., locality-sensitive hashing. One recent approach proposed by He et al. (ACM CCS 2017) uses locality-sensitive hashing and then releases a provably differential private representation of the hash bins. However, differential privacy still leaks some, although provable bounded information and does not protect against attacks, such as property inference attacks. Another recent approach by Khurram and Kerschbaum (IEEE ICDE 2020) uses locality-preserving hashing and provides cryptographic security, i.e., it releases no information except the output. However, locality-preserving hash functions are much harder to construct than locality-sensitive hash functions and hence accuracy of this approach is limited, particularly on larger datasets. In this paper, we address the open problem of providing cryptographic security of PRL while using locality-sensitive hash functions. Using recent results in oblivious algorithms, we design a new cryptographically secure PRL with locality-sensitive hash functions. Our prototypical implementation can match 40000 records in the British National Library/Toronto Public Library and the North Carolina Voter Registry datasets with 99.3% and 99.9% accuracy, respectively, in less than an hour which is more than an order of magnitude faster than Khurram and Kerschbaum's work at a higher accuracy.

  • Research Article
  • Cite Count Icon 8
  • 10.5829/ije.2021.34.08b.06
A Three-stage Filtering Approach for Face Recognition
  • Aug 1, 2021
  • International Journal of Engineering
  • Hamid Hassanpour + 1 more

Face recognition has become a crucial topic in recent decades, which offers important opportunities for applications in security surveillance, human-computer interaction, and forensics. However, it poses challenges, including uncontrolled environments, large datasets, and insufficiency of training data. In this paper, a face recognition system is proposed to iron out the above problems with a new framework based on a hashing function in a three-stage filtering approach. At the first stage, candidate subjects are chosen using the Locality-Sensitive Hashing (LSH) function. We employ a voting system to select candidates via disregarding a large number of dissimilar identities considering their local features. At the second stage, a robust image hashing based on Discrete Cosine Transform (DCT) coefficients is used to further refine the candidate images in terms of global visual information. Finally, the test image is recognized among selected identities using other visual information, resulting in further accuracy gains. Extensive experiments on FERET, AR, and ORL datasets show that the proposed method outperforms with a significant improvement in accuracy over the state-of-the-art methods.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/cbd54617.2021.00010
Odd-Even Hash Algorithm: A Improvement of Cuckoo Hash Algorithm
  • Mar 1, 2022
  • Haiting Zhu + 6 more

Hash-based data structures and algorithms are currently flourishing on the Internet. It is an effective way to store large amounts of information, especially for applications related to measurement, monitoring and security. At present, there are many hash table algorithms such as: Cuckoo Hash, Peacock Hash, Double Hash, Link Hash and D-left Hash algorithm. However, there are still some problems in these hash table algorithms, such as excessive memory space, long insertion and query operations, and insertion failures caused by infinite loops that require rehashing. This paper improves the kick-out mechanism of the Cuckoo Hash algorithm, and proposes a new hash table structure- Odd-Even Hash (OE Hash) algorithm. The experimental results show that OE Hash algorithm is more efficient than the existing Link Hash algorithm, Linear Hash algorithm, Cuckoo Hash algorithm, etc. OE Hash algorithm takes into account the performance of both query time and insertion time while occupying the least space, and there is no insertion failure that leads to rehashing, which is suitable for massive data storage.

  • Conference Article
  • Cite Count Icon 18
  • 10.1109/bigcomp.2018.00048
Single Hash: Use One Hash Function to Build Faster Hash Based Data Structures
  • Jan 1, 2018
  • Xiangyang Gou + 7 more

With the scale of data to store or monitor in nowadays network constantly increasing, hash based data structures are more and more widely used because of their high memory et1iciency and high speed. Most of them, like Bloom filters, sketchesand d-Iefthash tables use more than one hash function. Furthermore, in order to achieve good randomicity, the hash functions used, like MD5 and SHA1, are very complicated and consumea lot of CPU cyclesto carry out. As a consequence, the implementation of these hash functions will be time-consuming, In order to address this issue, wepropose SingleHash technique in this paper. It is based on the observation that the hash functions we use produce 32-bit or M-bit values which have much bigger value ranges than that we need in practice. We usually have to carry out modular operation to map the hash results into a smaller range in the data structures listed above. In this procedure, information carried by the high bits may be discarded. For example, if in a Bloom filter the length of the bit array is 220 while the hash functions we use are 32-bit hash functions, there are 12 bits in the results of the hash functions discarded in the procedure of modular. We can use these bits to produce more hash values. Therefore, we propose to use a few bit operations to make full use of the information produced by one hash function and generate multiple hash values which can be used in these data structures. SingleHash technique can be applied to most of the hash based data structures. It can significantly improve their speed, because instead of carrying out multiple hash functions, we only need to compute one hash function and a few simple operations (e.g., bit shift and XOR). Other aspects of performance, likememoryefficiency and accuracy of these data structures willnot be influenced by Single Hash technique. In this paper, weapply it to three kinds of classic hash based data structures, i.e., Bloomfilters, CM sketches and d-Iefthash tables as case studies, and evaluate their performance with both mathematical analysis and extensive experiments. We make all our codes open source on Github.

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.engappai.2024.109697
A large-scale group decision making model with a clustering algorithm based on a locality sensitive hash function
  • Nov 30, 2024
  • Engineering Applications of Artificial Intelligence
  • Zhangqian Mu + 2 more

A large-scale group decision making model with a clustering algorithm based on a locality sensitive hash function

  • Conference Article
  • Cite Count Icon 15
  • 10.4230/lipics.stacs.2012.25
On Randomness in Hash Functions (Invited Talk).
  • Feb 3, 2012
  • DROPS (Schloss Dagstuhl – Leibniz Center for Informatics)
  • Martin Dietzfelbinger

In the talk, we shall discuss quality measures for hash functions used in data structures and algorithms, and survey positive and negative results. (This talk is not about cryptographic hash functions.) For the analysis of algorithms involving hash functions, it is often convenient to assume the hash functions used behave fully randomly; in some cases there is no analysis known that avoids this assumption. In practice, one needs to get by with weaker hash functions that can be generated by randomized algorithms. A well-studied range of applications concern realizations of dynamic dictionaries (linear probing, chained hashing, dynamic perfect hashing, cuckoo hashing and its generalizations) or Bloom filters and their variants. A particularly successful and useful means of classification are Carter and Wegman's universal or k-wise independent classes, introduced in 1977. A natural and widely used approach to analyzing an algorithm involving hash functions is to show that it works if a sufficiently strong universal class of hash functions is used, and to substitute one of the known constructions of such classes. This invites research into the question of just how much independence in the hash functions is necessary for an algorithm to work. Some recent analyses that gave impossibility results constructed rather artificial classes that would not work; other results pointed out natural, widely used hash classes that would not work in a particular application. Only recently it was shown that under certain assumptions on some entropy present in the set of keys even 2-wise independent hash classes will lead to strong randomness properties in the hash values. The negative results show that these results may not be taken as justification for using weak hash classes indiscriminately, in particular for key sets with structure. When stronger independence properties are needed for a theoretical analysis, one may resort to classic constructions. Only in 2003 it was found out how full randomness can be simulated using only linear space overhead (which is optimal). The split-and-share approach can be used to justify the full randomness assumption in some situations in which full randomness is needed for the analysis to go through, like in many applications involving multiple hash functions (e.g., generalized versions of cuckoo hashing with multiple hash functions or larger bucket sizes, load balancing, Bloom filters and variants, or minimal perfect hash function constructions). For practice, efficiency considerations beyond constant factors are important. It is not hard to construct very efficient 2-wise independent classes. Using k-wise independent classes for constant k bigger than 3 has become feasible in practice only by new constructions involving tabulation. This goes together well with the quite new result that linear probing works with 5-independent hash functions. Recent developments suggest that the classification of hash function constructions by their degree of independence alone may not be adequate in some cases. Thus, one may want to analyze the behavior of specific hash classes in specific applications, circumventing the concept of k-wise independence. Several such results were recently achieved concerning hash functions that utilize tabulation. In particular if the analysis of the application involves using randomness properties in graphs and hypergraphs (generalized cuckoo hashing, also in the version with a stash, or load balancing), a hash class combining k-wise independence with tabulation has turned out to be very powerful.

  • Book Chapter
  • Cite Count Icon 17
  • 10.1007/978-3-030-32047-8_1
Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search
  • Jan 1, 2019
  • Tobias Christiani

The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution \(\mathcal {H}\) over locality-sensitive hash functions that partition space. For a collection of n points, after preprocessing, the query time is dominated by \(O(n^{\rho } \log n)\) evaluations of hash functions from \(\mathcal {H}\) and \(O(n^{\rho })\) hash table lookups and distance computations where \(\rho \in (0,1)\) is determined by the locality-sensitivity properties of \(\mathcal {H}\). It follows from a recent result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive hash functions can be reduced to \(O(\log ^2 n)\), leaving the query time to be dominated by \(O(n^{\rho })\) distance computations and \(O(n^{\rho } \log n)\) additional word-RAM operations. We state this result as a general framework and provide a simpler analysis showing that the number of lookups and distance computations closely match the Indyk-Motwani framework. Using ideas from another locality-sensitive hashing framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of additional word-RAM operations to \(O(n^\rho )\).

  • Conference Article
  • Cite Count Icon 1
  • 10.4230/lipics.icdt.2021.21
Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing
  • Mar 29, 2021
  • DROPS (Schloss Dagstuhl – Leibniz Center for Informatics)
  • Samuel Mccauley

Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess n strings of length d, to quickly answer queries q of the form: if there is a database string within edit distance r of q, return a database string within edit distance cr of q. Previous approaches to this problem either rely on very large (superconstant) approximation ratios c, or very small search radii r. Outside of a narrow parameter range, these solutions are not competitive with trivially searching through all n strings. In this work we give a simple and easy-to-implement hash function that can quickly answer queries for a wide range of parameters. Specifically, our strategy can answer queries in time O(d3^rn^{1/c}). The best known practical results require c ≫ r to achieve any correctness guarantee; meanwhile, the best known theoretical results are very involved and difficult to implement, and require query time that can be loosely bounded below by 24^r. Our results significantly broaden the range of parameters for which there exist nontrivial theoretical bounds, while retaining the practicality of a locality-sensitive hash function.

  • Book Chapter
  • Cite Count Icon 4
  • 10.1007/978-3-642-40776-5_4
Similarity-Based Resource Retrieval in Multi-agent Systems by Using Locality-Sensitive Hash Functions
  • Jan 1, 2013
  • Malte Aschermann + 1 more

In this paper we address the problem of retrieving similar resources which are distributed over a multi-agent system (MAS). In distributed environments identification of resources is realized by using cryptographic hash functions like SHA-1. The issue with these functions in connection with similarity search is that they distribute their hash values uniformly over the codomain. Therefore such IDs cannot be used to estimate the similarity of resources, unless one enumerates the whole search space and retrieves every resource for comparison. In this paper we present a three-layer architecture and a data model to efficiently locate similar resources in linear time complexity by using locality-sensitive hash functions. We design the data model as an extension to distributed environments (MAS), which only need to provide at least basic resource management capabilities, such as storing and retrieving resources by their ID. We use a benchmark data set to compare our approach with state-of-the-art centralized heuristic approaches and show that, while these approaches provide better search accuracy, our approach can deal with decentralized data and thus, allows us to flexibly adapt to dynamic changes in the underlying MAS by distributing and updating sets of information about similarities over different agents.

  • Book Chapter
  • Cite Count Icon 119
  • 10.1007/3-540-55719-9_77
Polynomial hash functions are reliable
  • Jan 1, 1992
  • M Dietzfelbinger + 3 more

Polynomial hash functions are well studied and widely used in various applications. They have gained popularity because of certain performances they exhibit. It has been shown that even linear hash functions are expected to have such performances. However, quite often we would like the hash functions to be reliable, meaning that they perform well with high probability; for some certain important properties even higher degree polynomials were not known to be reliable. We show that for certain important properties linear hash functions are not reliable. We give indication that quadratic hash functions might not be reliable. On the positive side, we prove that cubic hash functions are reliable. In a more general setting, we show that higher degree of the polynomial hash functions translates into higher reliability. We also introduce a new class of hash functions, which enables to reduce the universe size in an efficient and simple manner. The reliability results and the new class of hash functions are used for some fundamental applications: improved and simplified reliable algorithms for perfect hash functions and real-time dictionaries, which use significantly less random bits, and tighter upper bound for the program size of perfect hash functions.

  • Research Article
  • Cite Count Icon 33
  • 10.1089/aid.1995.11.625
Nucleotide Sequence and Restriction Fragment Length Polymorphism Analysis of the Long Terminal Repeat of Human T Cell Leukemia Virus Type II
  • May 1, 1995
  • AIDS Research and Human Retroviruses
  • Nobutaka Eiraku + 8 more

Molecular studies have demonstrated the existence of two major subtypes of human T cell leukemia virus type II: HTLV-IIa and HTLV-IIb. In attempts to further classify this family of viruses we have carried out nucleotide sequence and restriction fragment length polymorphism (RFLP) analysis of the long terminal repeat (LTR), a region that has been shown in previous studies to have the greatest intra- and intersubtype genomic divergence. Analysis of the nucleotide sequences suggested the existence of distinct phylogenetic groups in each subtype and, on the basis of predicted differences in restriction endonuclease sites, RFLP analysis allowed the identification of four groups within the IIa subtype (a1-a4) and six within the IIb subtype (b1-b6). Nucleotide sequence analysis also suggested the possible existence of HTLV-II quasispecies. However, this appeared not to be significant, and preliminary studies suggest that these would not be expected to influence the results of RFLP analysis appreciably. The validity of the RFLP method was demonstrated in an analysis of 36 randomly chosen samples from HTLV-II seropositive blood donors from the New York City Blood Center, where it could be shown that all could be successfully classified. Moreover, the RFLP analysis correctly matched the viruses in donors and recipients of contaminated blood in four situations in which HTLV-II was inadvertently transmitted by transfusion. RFLP analysis of the LTR appears to be a rapid and reliable method by which to identify HTLV-II infection. This should prove useful in studies of the epidemiology and the characterization of viruses present both in nonindigenous and indigenous populations.

  • Research Article
  • 10.1137/siread000053000003000545000001
SIGEST
  • Jan 1, 2011
  • SIAM Review
  • The Editors

The SIGEST article in this issue, “Linear Probing with 5-wise Independence” by Anna Pagh, Rasmus Pagh, and Milan Ružić, is about one form of hashing, a method for fast data retrieval. Hashing is used in operating systems and compilers for symbol storage and management of memory pages and buffers. Database systems and routers use hashing to manage their data structures as well. Hashing is best explained with an example, and we will use the most common one. Some of you may remember searching for a telephone number with a phone book. A phone book was a large hard-copy volume with an alphabetical list of people with the phone number of each person listed to the right of the name. Searching for a phone number meant opening the book, finding the right starting letter, looking for the next letter, and finally figuring out which of several identically named people was the one you wanted. In general, you'd eliminate all but one letter on the first pass, and then cut the search space by roughly half with every step after that. The time to find a number is, therefore, approximately logarithmic in the number of names with the same starting letter. Hashing is a better way to find a phone number, especially with a computer. One allocates storage for a hash table and stores names in the table with a hash function h. The hash function maps names to integers, and a name together with the phone number is stored in location $h(name)$ in the table. Looking up the number requires only evaluating $h(name)$ and getting the number. If one can avoid conflicts (not likely), then the work is constant rather than logarithmic. If conflicts are possible, so h(`A. E. Newman') can be the same as h(`L. Trotsky'), then the storage/lookup algorithms and the hash function must have good theoretical properties, if the ideal constant lookup time is to be saved. These theoretical properties are expressed in probabilistic terms, and therefore the complexity results describe the expected cost of adding to the table or looking up an entry. In general a hash table maps keys (the names in the example) to locations where values (the phone numbers) are stored. Linear probing is one way to organize the storage and lookup. Linear probing tries to put a value in $h(key)$. If that location is already occupied, then one tries $h(key)+k$ for $k=1, \dots$ until one finds an empty location. The data are stored in the first empty location. It is possible that $h(key)$ may be in a long string of occupied positions (called “pileup” in the paper) and then the performance will be poor. A good hash function can eliminate this problem in the sense that the expected cost per operation is constant. A hash function with uniformly distributed and independent function values would be such a good hash function, but it is very difficult to construct such a function. In many cases, however, a hash function which is random with respect to small sets of keys will suffice. The main result of the paper is that, under some technical assumptions, a hash function which is 5-wise independent gives a constant expected cost per operation. This means that given five keys, the values of h at the keys are independent random variables. The paper also discusses how one can construct such hash functions. Since this paper appeared, others have shown that 4-wise independence is not enough, so the result in the paper is sharp in that sense. The authors have taken pains in their SIGEST paper to make a very technical topic in computer science accessible to the SIREV readership. The introduction gives you enough information to play with hash functions on your own and experience pileup personally.

  • PDF Download Icon
  • Research Article
  • 10.1007/s00453-025-01321-z
ShockHash: Near Optimal-Space Minimal Perfect Hashing Beyond Brute-Force
  • Jan 1, 2025
  • Algorithmica
  • Hans-Peter Lehmann + 2 more

A minimal perfect hash function (MPHF) maps a set S of n keys to the first n integers without collisions. There is a lower bound of nlog _2e-mathcal {O}(log n) approx 1.44n bits needed to represent an MPHF. This can be reached by a brute-force algorithm that tries e^n hash function seeds in expectation and stores the first seed that leads to an MPHF. The most space-efficient previous algorithms for constructing MPHFs all use such a brute-force approach as a basic building block. In this paper, we introduce ShockHash – Small, heavily overloaded cuckoo hash tables for minimal perfect hashing. ShockHash uses two hash functions h_0 and h_1, hoping for the existence of a function f : S rightarrow {0,1} such that x mapsto h_{f(x)}(x) is an MPHF on S. It then uses a 1-bit retrieval data structure to store f using n + o(n) bits. In graph terminology, ShockHash generates n-edge random graphs until stumbling on a pseudoforest – where each component contains as many edges as nodes. Using cuckoo hashing, ShockHash then derives an MPHF from the pseudoforest in linear time. We show that ShockHash needs to try only about (e/2)^n approx 1.359^n seeds in expectation. This reduces the space for storing the seed by roughly n bits (maintaining the asymptotically optimal space consumption) and speeds up construction by almost a factor of 2^n compared to brute-force. Bipartite ShockHash reduces the expected construction time again to about 1.166^n by maintaining a pool of candidate hash functions and checking all possible pairs. Using ShockHash as a building block within the RecSplit framework we obtain ShockHash-RS, which can be constructed up to 3 orders of magnitude faster than competing approaches. ShockHash-RS can build an MPHF for 10 million keys with 1.489 bits per key in about half an hour. When instead using ShockHash after an efficient k-perfect hash function, it achieves space usage similar to the best competitors, while being significantly faster to construct and query.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant