Hardware acceleration of genomics data analysis: challenges and opportunities.

Tony Robinson,Priyank Shukla,Jim Harkin

doi:10.1093/bioinformatics/btab017

Tony Robinson, Priyank Shukla + Show 1 more

Open Access

https://doi.org/10.1093/bioinformatics/btab017

Copy DOI

Journal: Bioinformatics	Publication Date: May 25, 2021
Citations: 13	License type: CC BY 4.0

Affiliation: University of Ulster, Altnagelvin Area Hospital

Abstract

SummaryThe significant decline in the cost of genome sequencing has dramatically changed the typical bioinformatics pipeline for analysing sequencing data. Where traditionally, the computational challenge of sequencing is now secondary to genomic data analysis. Short read alignment (SRA) is a ubiquitous process within every modern bioinformatics pipeline in the field of genomics and is often regarded as the principal computational bottleneck. Many hardware and software approaches have been provided to solve the challenge of acceleration. However, previous attempts to increase throughput using many-core processing strategies have enjoyed limited success, mainly due to a dependence on global memory for each computational block. The limited scalability and high energy costs of many-core SRA implementations pose a significant constraint in maintaining acceleration. The Networks-On-Chip (NoC) hardware interconnect mechanism has advanced the scalability of many-core computing systems and, more recently, has demonstrated potential in SRA implementations by integrating multiple computational blocks such as pre-alignment filtering and sequence alignment efficiently, while minimizing memory latency and global memory access. This article provides a state of the art review on current hardware acceleration strategies for genomic data analysis, and it establishes the challenges and opportunities of utilizing NoCs as a critical building block in next-generation sequencing (NGS) technologies for advancing the speed of analysis.

Highlights

In the 1990s, the human genome project created the first draft sequence of the entire human genome at an estimated cost of USD 3 billion (Muir et al, 2016; Sboner, 2011)
For simplicity of explanation of the alignment process, we have focused on constant gap penalty model here
Initialization of boundary conditions in dynamic programming (DP) matrix: For NW global alignment, following the scoring schema expressed in Equations 1, 2 and 3, the gap penalty conditions are imposed in the top row and leftmost column while initializing the first cell within the matrix, by deploying the Equations 4 and 5 (Lesk, 2008)

Summary

Introduction

In the 1990s, the human genome project created the first draft sequence of the entire human genome at an estimated cost of USD 3 billion (Muir et al, 2016; Sboner, 2011). The significant output of massively parallel next-generation sequencing (NGS) technologies has a compounding effect on many challenges across the bioinformatics pipelines (Lightbody et al, 2019). Such technologies within the field of genomics have caused a shift in emphasis from sequencing as the principal challenge to efficient methods of accessing, sharing and analysing data (Lightbody et al, 2019; McVicar et al, 2016; Muir et al, 2016). In the field of genomics, short read alignment (SRA) is an essential component within the modern bioinformatics pipeline and is one of the most significant computational challenges to date (Lightbody et al, 2019).

Genome sequencing and genomic data analysis

Applications of genomic data

Typical bioinformatics pipeline

Short read alignment

Computational challenges of short read alignment

Alignment challenges

Hardware acceleration

Many-core processing and NoC interconnect

Opportunities in short read alignment acceleration

Alignment computation

Search space reduction

Latency and memory overhead

Advances using Networks-On-Chip

Discussion

Findings

Conclusion

Conflict of Interest

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hardware acceleration of genomics data analysis: challenges and opportunities.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Hybrid Cloud Computing Solution for Streamlined Genome Data Analysis
Jitao Yang
-
Jitao YangJitao Yang
07 Nov 2017
07 Nov 2017

Genomics analysis by pipelined bioinformatics software in cloud
Jitao Yang
-
Jitao YangJitao Yang
01 Aug 2017
01 Aug 2017

Scalable Pathogen Pipeline Platform (SP^3): Enabling Unified Genomic Data Analysis with Elastic Cloud Computing
Fan Yang-Turner ... Sarah Hoosdally
-
Fan Yang-Turner, et. al.Fan Yang-Turner ... Sarah Hoosdally
01 Jul 2019
01 Jul 2019

GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data
Konstantina Koliogeorgi ... Georgi Gaydadjiev
IEEE Transactions on Computers | VOL. 71
Konstantina Koliogeorgi, et. al.Konstantina Koliogeorgi ... Georgi Gaydadjiev
01 Nov 2022
IEEE Transactions on Computers | VOL. 71

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hardware acceleration of genomics data analysis: challenges and opportunities.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics