Sequence Alignment Using Machine Learning-Based Needleman–Wunsch Algorithm

Amr Ezz El-Din Rashed,Hanan M Amer,Hossam El-Din Moustafa,Mervat El-Seddek

doi:10.1109/access.2021.3100408

Amr Ezz El-Din Rashed, Hanan M Amer + Show 2 more

Open Access

https://doi.org/10.1109/access.2021.3100408

Copy DOI

Abstract

Biological pairwise sequence alignment can be used as a method for arranging two biological sequence characters to identify regions of similarity. This operation has elicited considerable interest due to its significant influence on various critical aspects of life (e.g., identifying mutations in coronaviruses). Sequence alignment over large databases cannot yield results within a reasonable time, power, and cost. heuristic methods, such as FASTA, the BLAST family have been demonstrated to perform 40 times faster than DP-based (e.g., Needleman-Wunsch) techniques they cannot guarantee an optimum alignment result An optimized software platform of a widely used DNA sequence alignment algorithm called the Needleman-Wunsch (NW) algorithm based on a lookup table, is described in this study. This global alignment algorithm is the best approach for identifying similar regions between sequences. This study presents a new application of classical machine learning (ML) to global sequence alignment. Customized ML models are used to implement NW global alignment. An accuracy of 99.7% is achieved when using a multilayer perceptron with the ADAM optimizer, and up to 2912 Giga cell updates per second are realized on two real DNA sequences with a length of 4.1 M nucleotides. Our implementation is valid for RNA/DNA sequences. This study aims to parallelize the computation steps involved in the algorithm to accelerate its performance by using ML algorithms. All datasets used in this study are available from https://ieee-dataport.org/documents/dna-sequence-alignment-datasets-based-nw-algorithm.

Highlights

Bioinformatics has developed due to the need for understanding the code of life, i.e., deoxyribonucleic acid (DNA)
Bioinformatics is an integration of biology and informatics because it includes the innovation of using computers in the measurement, recovery, control, and appropriation of information related to natural macromolecules, such as DNA, Ribonucleic acid (RNA), and proteins
PROPOSED ALGORITHM In the current study, we propose the use of equal-length sequences that can be applied to DNA or RNA sequences because DNA and RNA sequences consist of four letters of the alphabet that represent the four NTs

Summary

Introduction

Bioinformatics has developed due to the need for understanding the code of life, i.e., deoxyribonucleic acid (DNA). Bioinformatics is an integration of biology and informatics because it includes the innovation of using computers in the measurement, recovery, control, and appropriation of information related to natural macromolecules, such as DNA, RNA, and proteins. Research endeavors in this field include genome assembly, sequence alignment, drug design, gene finding, drug discovery, protein structure alignment, and protein structure prediction [2]. Match ← H(i−1, j−1) + S(Ai, Bj) Delete ← H(i−1, j) + W Insert ← H(i, j−1) + W H(i,j) ← max(Match, Insert, Delete) } This algorithm requires too long running time (O(MN)) when aligning two, extremely long sequences. It Can be applied to problems that consist of overlapping subproblems (e.g., two unequal length sequences)

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Sequence Alignment Using Machine Learning-Based Needleman–Wunsch Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

SLPal: Accelerating Long Sequence Alignment on Many-Core and Multi-Core Architectures
Xiaoming Xu ... Xiaoning Wang
-
Xiaoming Xu, et. al.Xiaoming Xu ... Xiaoning Wang
16 Dec 2020
16 Dec 2020

Analyzing the Interaction of RseA and RseB, the Two Negative Regulators of the σE Envelope Stress Response, Using a Combined Bioinformatic and Experimental Strategy
Nidhi Ahuja ... Carol A Gross
Journal of Biological Chemistry | VOL. 284
Nidhi Ahuja, et. al.Nidhi Ahuja ... Carol A Gross
01 Feb 2009
Journal of Biological Chemistry | VOL. 284

MPSAGA: a matrix-based pair-wise sequence alignment algorithm for global alignment with position based sequence representation
Jyoti Lakhani ... Anupama Choudhary
Sādhanā | VOL. 44
Jyoti Lakhani, et. al.Jyoti Lakhani ... Anupama Choudhary
29 Jun 2019
Sādhanā | VOL. 44

Evaluating global and local sequence alignment methods for comparing patient medical records
Ming Huang ... Lixia Yao
BMC Medical Informatics and Decision Making | VOL. 19
Ming Huang, et. al.Ming Huang ... Lixia Yao
01 Dec 2019
BMC Medical Informatics and Decision Making | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sequence Alignment Using Machine Learning-Based Needleman–Wunsch Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access