Abstract

Phylogenetic tree is essential to understand evolution and it is usually constructed through multiple sequence alignment, which suffers from heavy computational burdens and requires sophisticated parameter tuning. Recently, alignment free methods based on k-mer profiles or common substrings provide alternative ways to construct phylogenetic trees. However, most of these methods ignore the global similarities between sequences or some specific valuable features, e.g., frequent patterns overall datasets. To make further improvement, we propose an alignment free algorithm based on sequential pattern mining, where each sequence is converted into a binary representation of sequential patterns among sequences. The phylogenetic tree is further constructed via clustering distance matrix which is calculated from pattern vectors. To increase accuracy for highly divergent sequences, we consider pattern weight and filtering redundancy sub-patterns. Both simulated and real data demonstrates our method outperform other alignment free methods, especially for large sequence set with low similarity.

Highlights

  • Construction of a phylogenetic tree is one of the fundamentals in bioinformatics

  • For large amount of divergent sequences, using feature space to construct informative pattern vectors and build accurate phylogenetic tree from sequences is the focus of our study

  • Instead of using k-mer or substring based alignment free method, we propose sequential pattern mining based approach to identify patterns shared among sequences and use them to measure similarity between sequences for phylogeny reconstruction

Read more

Summary

Introduction

Construction of a phylogenetic tree is one of the fundamentals in bioinformatics. It describes how a protein (gene) family might have been evolved. The alignment results depend on various parameters such as gap opening and extension penalties and it might affect the final phylogeny [7]. These methods will be affected by the guild trees during alignment [8]. Though sequences of species from the same origin divergent differently under selection pressure, some fragments still conserved between species. It is important to analyze these low similarity sequences

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.