Abstract
In bioinformatics, protein multiple sequence alignment (MSA) and phylogenetic tree construction are among the major problems for which many algorithms have been developed to improve the accuracy the results. However, finding the best algorithm among the available ones remains a challenging task since the efficiency of an algorithm is closely related to the characteristics of the input sequences. Moreover, each algorithm has different parameters that should be configured depending on the input sequences. In this paper, we introduce an expert system specialized in the prediction of the most suitable algorithm for both MSA and phylogenetic tree construction. To construct the knowledge base, we built datasets whose instances are sets of protein sequences and whose attributes are various characteristics of the sequences that have significant influence on the quality of the results. Decision trees were induced from the datasets in order to generate the rules contained in the knowledge base. The inference engine could predict not only the most relevant algorithm but also the most appropriate parameters for that algorithm, either for MSA or for phylogenetic tree construction. Experiments show that the system is reliable, and it allows users to directly obtain accurate results without the time-consuming task of comparing results from different algorithms.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have