A Framework for Space-Efficient String Kernels

Djamal Belazzougui,Fabio Cunial

doi:10.1007/s00453-017-0286-4

Abstract

String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the \(k\)-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in \(O(nd)\) time and in \(o(n)\) bits of space in addition to the input, using just a \(\mathtt {rangeDistinct}\) data structure on the Burrows-Wheeler transform of the input strings that takes \(O(d)\) time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple values of \(k\), like the \(k\)-mer profile and the \(k\)-th order empirical entropy, and for calibrating the value of \(k\) using the data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Framework for Space-Efficient String Kernels

Abstract

Talk to us

Similar Papers

More From: Algorithmica

Lead the way for us

Journal: Algorithmica	Publication Date: Feb 7, 2017
Citations: 13

Similar Papers

A Framework for Space-Efficient String Kernels
Djamal Belazzougui ... Fabio Cunial
-
Djamal Belazzougui, et. al.Djamal Belazzougui ... Fabio Cunial
01 Jan 2015
01 Jan 2015

Entropy Bounds for Grammar-Based Tree Compressors
Danny Hucke ... Markus Lohrey
IEEE Transactions on Information Theory | VOL. 67
Danny Hucke, et. al.Danny Hucke ... Markus Lohrey
01 Nov 2021
IEEE Transactions on Information Theory | VOL. 67

Position-Aware String Kernels with Weighted Shifts and a General Framework to Apply String Kernels to Other Structured Data
Kilho Shin
-
Kilho ShinKilho Shin
16 Dec 2007
16 Dec 2007

Space-Efficient Construction of LZ-Index
Diego Arroyuelo ... Gonzalo Navarro
-
Diego Arroyuelo, et. al.Diego Arroyuelo ... Gonzalo Navarro
01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Framework for Space-Efficient String Kernels

Abstract

Talk to us

Similar Papers

More From: Algorithmica