Abstract
Similarity analysis of DNA sequences can clarify the homology between sequences and predict the structure of, and relationship between, them. At the same time, the frequent patterns of biological sequences explain not only the genetic characteristics of the organism, but they also serve as relevant markers for certain events of biological sequences. However, most of the aforementioned biological sequence similarity analysis methods are targeted at the entire sequential pattern, which ignores the missing gene fragment that may induce potential disease. The similarity analysis of such sequences containing a missing gene item is a blank. Consequently, some sequences with missing bases are ignored or not effectively analyzed. Thus, this paper presents a new method for DNA sequence similarity analysis. Using this method, we first mined not only positive sequential patterns, but also sequential patterns that were missing some of the base terms (collectively referred to as negative sequential patterns). Subsequently, we used these frequent patterns for similarity analysis on a two-dimensional plane. Several experiments were conducted in order to verify the effectiveness of this algorithm. The experimental results demonstrated that the algorithm can obtain various results through the selection of frequent sequential patterns and that accuracy and time efficiency was improved.
Highlights
In recent years, a large volume of biological sequence data has been generated
Because the DNA sequence corresponds to its time series one to one, the similarity of the DNA
We compared the results of the frequent patterns mining of the first exon of the β-protein gene of the 10 different species based on our proposed graphical representation
Summary
When a new DNA sequence is obtained, similarity analysis is used in order to determine whether it is similar to a known sequence. If it is homologous, this will save time and effort in re-determining the function of the new sequence. Similarity analysis of biological sequences is by no means a straightforward mechanical comparison. Alignment and classical research methods are the most common. Two problems exist that directly affect the similarity score: the substitution matrix and gap penalty. Gap penalty is used to compensate the influence of insertion and deletion on sequence similarity and no suitable theoretical model exists to describe the slot problem. Vacancy penalty points lack a functional theoretical basis and are subjectivity
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.