Abstract

SummaryThe significant decline in the cost of genome sequencing has dramatically changed the typical bioinformatics pipeline for analysing sequencing data. Where traditionally, the computational challenge of sequencing is now secondary to genomic data analysis. Short read alignment (SRA) is a ubiquitous process within every modern bioinformatics pipeline in the field of genomics and is often regarded as the principal computational bottleneck. Many hardware and software approaches have been provided to solve the challenge of acceleration. However, previous attempts to increase throughput using many-core processing strategies have enjoyed limited success, mainly due to a dependence on global memory for each computational block. The limited scalability and high energy costs of many-core SRA implementations pose a significant constraint in maintaining acceleration. The Networks-On-Chip (NoC) hardware interconnect mechanism has advanced the scalability of many-core computing systems and, more recently, has demonstrated potential in SRA implementations by integrating multiple computational blocks such as pre-alignment filtering and sequence alignment efficiently, while minimizing memory latency and global memory access. This article provides a state of the art review on current hardware acceleration strategies for genomic data analysis, and it establishes the challenges and opportunities of utilizing NoCs as a critical building block in next-generation sequencing (NGS) technologies for advancing the speed of analysis.

Highlights

  • In the 1990s, the human genome project created the first draft sequence of the entire human genome at an estimated cost of USD 3 billion (Muir et al, 2016; Sboner, 2011)

  • For simplicity of explanation of the alignment process, we have focused on constant gap penalty model here

  • Initialization of boundary conditions in dynamic programming (DP) matrix: For NW global alignment, following the scoring schema expressed in Equations 1, 2 and 3, the gap penalty conditions are imposed in the top row and leftmost column while initializing the first cell within the matrix, by deploying the Equations 4 and 5 (Lesk, 2008)

Read more

Summary

Introduction

In the 1990s, the human genome project created the first draft sequence of the entire human genome at an estimated cost of USD 3 billion (Muir et al, 2016; Sboner, 2011). The significant output of massively parallel next-generation sequencing (NGS) technologies has a compounding effect on many challenges across the bioinformatics pipelines (Lightbody et al, 2019). Such technologies within the field of genomics have caused a shift in emphasis from sequencing as the principal challenge to efficient methods of accessing, sharing and analysing data (Lightbody et al, 2019; McVicar et al, 2016; Muir et al, 2016). In the field of genomics, short read alignment (SRA) is an essential component within the modern bioinformatics pipeline and is one of the most significant computational challenges to date (Lightbody et al, 2019).

Genome sequencing and genomic data analysis
Applications of genomic data
Typical bioinformatics pipeline
Short read alignment
Computational challenges of short read alignment
Alignment challenges
Hardware acceleration
Many-core processing and NoC interconnect
Opportunities in short read alignment acceleration
Alignment computation
Search space reduction
Latency and memory overhead
Advances using Networks-On-Chip
Discussion
Findings
Conclusion
Conflict of Interest

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.