Abstract

Viral evolution remains to be a main obstacle in the effectiveness of antiviral treatments. The ability to predict this evolution will help in the early detection of drug-resistant strains and will potentially facilitate the design of more efficient antiviral treatments. Various tools has been utilized in genome studies to achieve this goal. One of these tools is machine learning, which facilitates the study of structure-activity relationships, secondary and tertiary structure evolution prediction, and sequence error correction. This work proposes a novel machine learning technique for the prediction of the possible point mutations that appear on alignments of primary RNA sequence structure. It predicts the genotype of each nucleotide in the RNA sequence, and proves that a nucleotide in an RNA sequence changes based on the other nucleotides in the sequence. Neural networks technique is utilized in order to predict new strains, then a rough set theory based algorithm is introduced to extract these point mutation patterns. This algorithm is applied on a number of aligned RNA isolates time-series species of the Newcastle virus. Two different data sets from two sources are used in the validation of these techniques. The results show that the accuracy of this technique in predicting the nucleotides in the new generation is as high as 75 %. The mutation rules are visualized for the analysis of the correlation between different nucleotides in the same RNA sequence.

Highlights

  • The deoxyribonucleic acid (DNA) strands are composed of units of nucleotides

  • This paper presents a proof of concept by applying this technique on a set of successive generations of the Newcastle Disease Virus (NDV) from two different countries, Korea and China [17, 18]

  • 3 Results The analysis of ribonucleic acid (RNA) mutations requires the gathering and preparing a set of aligned RNA sequences that go through different mutations over a long period of time

Read more

Summary

Introduction

The deoxyribonucleic acid (DNA) strands are composed of units of nucleotides. Each nucleotide is composed of a nitrogen-containing nucleobase, which is either guanine (G), adenine (A), thymine (T), or cytosine (C). Most DNA molecules consist of two strands coiled around each other forming a double helix. These DNA strands are used as a template to create the ribonucleic acid (RNA) in a process known as transcription. Unlike DNA, RNA is often found as a single-strand. The sequence of mRNA is what specifies the sequence of amino acids the formed protein. DNA and RNA are the main component of viruses. Some of the viruses are DNA-based, while others are RNA-based such as Newcastle, HIV, and flu [1]. RNA viruses are different than the DNA-based viruses in the sense that they

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.