Sketching and Sublinear Data Structures in Genomics

Guillaume Marçais,Rob Patro,Carl Kingsford,Brad Solomon

doi:10.1146/annurev-biodatasci-072018-021156

Sketching and Sublinear Data Structures in Genomics

Guillaume Marçais, Rob Patro + Show 2 more

Open Access

https://doi.org/10.1146/annurev-biodatasci-072018-021156

Copy DOI

Journal: Annual Review of Biomedical Data Science	Publication Date: Jul 20, 2019
Citations: 41

Affiliation: Carnegie Mellon University, Stony Brook University, Johns Hopkins University

#Genomic Analysis Methods #Locality-sensitive Hashing + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Large-scale genomics demands computational methods that scale sublinearly with the growth of data. We review several data structures and sketching techniques that have been used in genomic analysis methods. Specifically, we focus on four key ideas that take different approaches to achieve sublinear space usage and processing time: compressed full-text indices, approximate membership query data structures, locality-sensitive hashing, and minimizers schemes. We describe these techniques at a high level and give several representative applications of each.

Full Text