Efficient discovery of longest-lasting correlation in sequence databases

Yuhong Li,Man Lung Yiu,Leong Hou U,Zhiguo Gong

doi:10.1007/s00778-016-0432-7

Abstract

The search for similar subsequences is a core module for various analytical tasks in sequence databases. Typically, the similarity computations require users to set a length. However, there is no robust means by which to define the proper length for different application needs. In this study, we examine a new query that is capable of returning the longest-lasting highly correlated subsequences in a sequence database, which is particularly helpful to analyses without prior knowledge regarding the query length. A baseline, yet expensive, solution is to calculate the correlations for every possible subsequence length. To boost performance, we study a space-constrained index that provides a tight correlation bound for subsequences of similar lengths and offset by intraobject and interobject grouping techniques. To the best of our knowledge, this is the first index to support a normalized distance metric of arbitrary length subsequences. In addition, we study the use of a smart cache for disk-resident data (e.g., millions of sequence objects) and a graph processing unit-based parallel processing technique for frequently updated data (e.g., nonindexable streaming sequences) to compute the longest-lasting highly correlated subsequences. Extensive experimental evaluation on both real and synthetic sequence datasets verifies the efficiency and effectiveness of our proposed methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient discovery of longest-lasting correlation in sequence databases

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal

Lead the way for us

Journal: The VLDB Journal	Publication Date: Jun 23, 2016
Citations: 8

Similar Papers

Discovering longest-lasting correlation in sequence databases
Yuhong Li ... Zhiguo Gong
Proceedings of the VLDB Endowment | VOL. 6
Yuhong Li, et. al.Yuhong Li ... Zhiguo Gong
01 Sep 2013
Proceedings of the VLDB Endowment | VOL. 6

Stretch Profile: A pruning technique to accelerate DNA sequence search
Nalakkhana Khitmoh ... Sissades Tongsima
Informatics in Medicine Unlocked | VOL. 19
Nalakkhana Khitmoh, et. al.Nalakkhana Khitmoh ... Sissades Tongsima
01 Jan 2020
Informatics in Medicine Unlocked | VOL. 19

Querying and mining biological databases.
Ambuj K Singh
Omics : a journal of integrative biology | VOL. 7
Ambuj K SinghAmbuj K Singh
01 Jan 2003
Omics : a journal of integrative biology | VOL. 7

Accelerating large-scale biological database search on Xeon Phi-based neo-heterogeneous architectures
Haidong Lan ... Bertil Schmidt
-
Haidong Lan, et. al. Haidong Lan ... Bertil Schmidt
01 Nov 2015
01 Nov 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient discovery of longest-lasting correlation in sequence databases

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal