Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices

Oguz Selvitopi,Giulia Guidi,Saliya Ekanayake,Georgios A Pavlopoulos,Ariful Azad,Aydin Buluc

doi:10.1109/sc41405.2020.00079

Abstract

Identifying similar protein sequences is a core step in many computational biology pipelines such as detection of homologous protein sequences, generation of similarity protein graphs for downstream analysis, functional annotation, and gene location. Performance and scalability of protein similarity search have proven to be a bottleneck in many bioinformatics pipelines due to increase in cheap and abundant sequencing data. This work presents a new distributed-memory software PASTIS. PASTIS relies on sparse matrix computations for efficient identification of possibly similar proteins. We use distributed sparse matrices for scalability and show that the sparse matrix infrastructure is a great fit for protein similarity search when coupled with a fully-distributed dictionary of sequences that allow remote sequence requests to be fulfilled. Our algorithm incorporates the unique bias in amino acid sequence substitution in search without altering basic sparse matrix model, and in turn, achieves ideal scaling up to millions of protein sequences.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Structural Features can be Unconserved in Proteins with Similar Folds: An Analysis of Side-chain to Side-chain Contacts Secondary Structure and Accessibility
Robert B Russell ... Geoffrey J Barton
Journal of Molecular Biology | VOL. 244
Robert B Russell, et. al.Robert B Russell ... Geoffrey J Barton
01 Dec 1994
Journal of Molecular Biology | VOL. 244

Maps: An integrated system for protein sequence annotation using support vector machine
Jung‐Ying Wang ... Hahn‐Ming Lee
Journal of the Chinese Institute of Engineers | VOL. 31
Jung‐Ying Wang, et. al.Jung‐Ying Wang ... Hahn‐Ming Lee
01 Jul 2008
Journal of the Chinese Institute of Engineers | VOL. 31

Characterization of Bothrops jararaca coagulation inhibitor (BjI) and presence of similar protein in plasma of other animals
Anita M Tanaka-Azevedo ... Ida S Sano-Martins
Toxicon | VOL. 44
Anita M Tanaka-Azevedo, et. al.Anita M Tanaka-Azevedo ... Ida S Sano-Martins
23 Jul 2004
Toxicon | VOL. 44

Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words
Hsin-Nan Lin ... Eugene A Permyakov
PLoS ONE | VOL. 6
Hsin-Nan Lin, et. al.Hsin-Nan Lin ... Eugene A Permyakov
02 Dec 2011
PLoS ONE | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices

Abstract

Talk to us

Similar Papers