Abstract

In recent years, there has been an explosive increase in the amount of bioinformatics data produced, but data are not information. The purpose of bioinformatics research is to obtain information with biological significance from large amounts of data. Multiple sequence alignment is widely used in sequence homology detection, protein secondary and tertiary structure prediction, phylogenetic tree analysis, and other fields. Existing research mainly focuses on the specific steps of the algorithm or on specific problems, and there is a lack of high-level abstract domain algorithm frameworks. As a result, multiple sequence alignment algorithms are complex, redundant, and difficult to understand, and it is not easy for users to select the appropriate algorithm, which may lead to computing errors. Here, through in-depth study and analysis of the heuristic multiple sequence alignment algorithm (HMSAA) domain, a domain-feature model and an interactive model of HMSAA components have been established according to the generative programming method. With the support of the PAR (partition and recur) platform, the HMSAA algorithm component library is formalized and a specific alignment algorithm is assembled, thus improving the reliability of algorithm assembly. This work provides a valuable theoretical reference for the applications of other biological sequence analysis algorithms.

Highlights

  • Since the beginning of the 21st century, with the development of high-throughput sequencing technology, gene sequencing has become much cheaper and more efficient, enabling the development of various genome projects

  • There has been no work considering it as a domain for high-level abstraction to improve the reliability and the productivity of the algorithms, and to reduce the probability of suboptimal solutions, errors of the algorithm, etc

  • Generative programming and the composition of a generative domain model are first briefly presented in the paper, and the feature-oriented domain modeling (FODM) method is described

Read more

Summary

INTRODUCTION

Since the beginning of the 21st century, with the development of high-throughput sequencing technology, gene sequencing has become much cheaper and more efficient, enabling the development of various genome projects. A two-dimensional dynamic programming matrix is extended to the three-dimensional or multi-dimensional case, where the dimension of the matrix reflects the number of sequences to be compared This method is only suitable for multiple sequence alignments with few dimensions, otherwise it will be a great challenge with respect to computer resources. The basic idea of iterative alignment is first to improve the multiple sequence alignment based on an algorithm that can generate alignments, through a series of iterations, until the alignment results no longer improve or have reached the maximum number of iterations. The generative programming method is used to design an abstract generic algorithm component library, after which a specific alignment algorithm for the HMSAA domain is assembled, improving the reliability and reusability of the algorithms. Using a generic abstract programming language, Apla, the domain components are formally implemented and a high abstract component library is built on top of Apla

RELATED METHODOLOGY AND TECHNOLOGY
Penalty component
EXPERIMENTS
SUMMARY AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.