Linear Time Solution Research Articles

Let $$\mathcal {D} = \{\mathsf {T}_1,\mathsf {T}_2, \ldots ,\mathsf {T}_D\}$$D={T1,T2,ź,TD} be a collection of D string documents of n characters in total, that are drawn from an alphabet set $$\varSigma =[\sigma ]$$Σ=[ź]. The top-k document retrieval problem is to preprocess $$\mathcal{D}$$D into a data structure that, given a query $$(P[1\ldots p],k)$$(P[1źp],k), can return the k documents of $$\mathcal{D}$$D most relevant to the pattern P. The relevance is captured using a predefined ranking function, which depends on the set of occurrences of P in $$\mathsf {T}_d$$Td. For example, it can be the term frequency (i.e., the number of occurrences of P in $$\mathsf {T}_d$$Td), or it can be the term proximity (i.e., the distance between the closest pair of occurrences of P in $$\mathsf {T}_d$$Td), or a pattern-independent importance score of $$\mathsf {T}_d$$Td such as PageRank. Linear space and optimal query time solutions already exist for the general top-k document retrieval problem. Compressed and compact space solutions are also known, but only for a few ranking functions such as term frequency and importance. However, space efficient data structures for term proximity based retrieval have been evasive. In this paper we present the first sub-linear space data structure for this relevance function, which uses only o(n) bits on top of any compressed suffix array of $$\mathcal{D}$$D and solves queries in $$O((p+k) {{\mathrm{polylog}}}\,\,n)$$O((p+k)polylogn) time. We also show that scores that consist of a weighted combination of term proximity, term frequency, and document importance, can be handled using twice the space required to represent the text collection.

Read full abstract

Histogram indexing, also known as jumbled pattern indexing and permutation indexing is one of the important current open problems in pattern matching. It was introduced about 6 years ago and has seen active research since. Yet, to date there is no algorithm that can preprocess a text T in time o(|T|(2)/polylog|T|) and achieve histogram indexing, even over a binary alphabet, in time independent of the text length. The pattern matching version of this problem has a simple linear-time solution. Block-mass pattern matching problem is a recently introduced problem, motivated by issues in mass-spectrometry. It is also an example of a pattern matching problem that has an efficient, almost linear-time solution but whose indexing version is daunting. However, for fixed finite alphabets, there has been progress made. In this paper, a strong connection between the histogram indexing problem and the block-mass pattern indexing problem is shown. The reduction we show between the two problems is amazingly simple. Its value lies in recognizing the connection between these two apparently disparate problems, rather than the complexity of the reduction. In addition, we show that for both these problems, even over unbounded alphabets, there are algorithms that preprocess a text T in time o(|T|(2)/polylog|T|) and enable answering indexing queries in time polynomial in the query length. The contributions of this paper are twofold: (i) we introduce the idea of allowing a trade-off between the preprocessing time and query time of various indexing problems that have been stumbling blocks in the literature. (ii) We take the first step in introducing a class of indexing problems that, we believe, cannot be pre-processed in time o(|T|(2)/polylog|T|) and enable linear-time query processing.

Read full abstract

Linear Time Solution Research Articles

Related Topics

Articles published on Linear Time Solution

A Linear‐Time Solution for All‐SAT Problem Based on P System

Queueing and glueing for optimal partitioning (functional pearl)

Top-k Term-Proximity in Succinct Space

Solving optimization problems by using networks of evolutionary processors with quantitative filtering

Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems

RASCAL: A Randomized Approach for Coevolutionary Analysis.

Efficient solutions to hard computational problems by P systems with symport/antiport rules and membrane division

Calculating a linear-time solution to the densest-segment problem

Error diffusion on meshes

A POLYNOMIAL-TIME ALGORITHM FOR COMPUTING THE RESILIENCE OF ARRANGEMENTS OF RAY SENSORS

Reverse engineering of compact suffix trees and links: A novel algorithm

On the relationship between histogram indexing and block-mass indexing

A modified variable neighborhood search for the discrete ordered median problem

Unrooted Tree Reconciliation: A Unified Approach

On left and right seeds of a string

Low-Complexity Energy-Efficient Broadcasting in One-Dimensional Wireless Networks

Heuristic Based Task Scheduling in Multiprocessor Systems with Genetic Algorithm by Choosing the Eligible Processor

Parameterized longest previous factor

Maximum Segment Sum, Monadically (distilled tutorial)

Linear Time Solution to Prime Factorization by Tissue P Systems with Cell Division

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Linear Time Solution Research Articles

Related Topics

Articles published on Linear Time Solution

A Linear‐Time Solution for All‐SAT Problem Based on P System

Queueing and glueing for optimal partitioning (functional pearl)

Top-k Term-Proximity in Succinct Space

Solving optimization problems by using networks of evolutionary processors with quantitative filtering

Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems

RASCAL: A Randomized Approach for Coevolutionary Analysis.

Efficient solutions to hard computational problems by P systems with symport/antiport rules and membrane division

Calculating a linear-time solution to the densest-segment problem

Error diffusion on meshes

A POLYNOMIAL-TIME ALGORITHM FOR COMPUTING THE RESILIENCE OF ARRANGEMENTS OF RAY SENSORS

Reverse engineering of compact suffix trees and links: A novel algorithm

On the relationship between histogram indexing and block-mass indexing

A modified variable neighborhood search for the discrete ordered median problem

Unrooted Tree Reconciliation: A Unified Approach

On left and right seeds of a string

Low-Complexity Energy-Efficient Broadcasting in One-Dimensional Wireless Networks

Heuristic Based Task Scheduling in Multiprocessor Systems with Genetic Algorithm by Choosing the Eligible Processor

Parameterized longest previous factor

Maximum Segment Sum, Monadically (distilled tutorial)

Linear Time Solution to Prime Factorization by Tissue P Systems with Cell Division