Abstract
Finding the Multiple Longest Common Subsequences (MLCS) is a fundamental problem in many fields such as bioinformatics, computational genomics and pattern recognition. Existing algorithms for finding MLCSs from sequences are not suitable for the long and large-scale sequences due to their high time and space consumption. To overcome this problem, a new DAG (directed acyclic graph) model and a Novel Match Point Mapping algorithm (NMPM) based on dominant point are proprosed in this paper. In the DAG, there is no duplicate match point and each match point is mapped to a unique integer identifier. The DAG can be efficiently built by continually calculating successors of each match point. What is more, the high-dimensional match points can be removed if they are no longer used during the construction of DAG. Therefore, a great deal of memory space will be saved. The experiment results reveal that our new algorithm outperforms other leading algorithms, especially for large-scale MLCS problem.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have