A new 1.375-approximation algorithm for sorting by transpositions

Luiz Augusto G Silva,Noraí Romeu Rocco,Maria Emília M T Walter,Luis Antonio B Kowada

doi:10.1186/s13015-022-00205-z

Luiz Augusto G Silva, Noraí Romeu Rocco + Show 2 more

Open Access

https://doi.org/10.1186/s13015-022-00205-z

Copy DOI

Abstract

Backgroundsorting by transpositions (SBT) is a classical problem in genome rearrangements. In 2012, SBT was proven to be mathcal {NP}-hard and the best approximation algorithm with a 1.375 ratio was proposed in 2006 by Elias and Hartman (EH algorithm). Their algorithm employs simplification, a technique used to transform an input permutation pi into a simple permutation{hat{pi }}, presumably easier to handle with. The permutation {hat{pi }} is obtained by inserting new symbols into pi in a way that the lower bound of the transposition distance of pi is kept on {hat{pi }}. The simplification is guaranteed to keep the lower bound, not the transposition distance. A sequence of operations sorting {hat{pi }} can be mimicked to sort pi.Results and conclusionsFirst, using an algebraic approach, we propose a new upper bound for the transposition distance, which holds for all S_n. Next, motivated by a problem identified in the EH algorithm, which causes it, in scenarios involving how the input permutation is simplified, to require one extra transposition above the 1.375-approximation ratio, we propose a new approximation algorithm to solve SBT ensuring the 1.375-approximation ratio for all S_n. We implemented our algorithm and EH’s. Regarding the implementation of the EH algorithm, two other issues were identified and needed to be fixed. We tested both algorithms against all permutations of size n, 2le n le 12. The results show that the EH algorithm exceeds the approximation ratio of 1.375 for permutations with a size greater than 7. The percentage of computed distances that are equal to transposition distance, computed by the implemented algorithms are also compared with others available in the literature. Finally, we investigate the performance of both implementations on longer permutations of maximum length 500. From the experiments, we conclude that maximum and the average distances computed by our algorithm are a little better than the ones computed by the EH algorithm and the running times of both algorithms are similar, despite the time complexity of our algorithm being higher.

Highlights

It is known from previous research that the genomes of different species may present essentially the same set of genes in their DNA strands, not in the same order [1, 2], suggesting the occurrence of mutational events that affect large portions of DNA
In this paper, we first proposed a new upper bound for the transposition distance, using an algebraic approach, which holds for all Sn
We show in Appendix 1 that the EH algorithm may require one extra transposition above the 1.375-approximation ratio

Summary

Results and conclusions

Using an algebraic approach, we propose a new upper bound for the transposition distance, which holds for all Sn. Regarding the implementation of the EH algorithm, two other issues were identified and needed to be fixed. We tested both algorithms against all permutations of size n, 2 ≤ n ≤ 12. The percentage of computed distances that are equal to transposition distance, computed by the implemented algorithms are compared with others available in the literature. We investigate the performance of both implementations on longer permutations of maximum length 500. We conclude that maximum and the average distances computed by our algorithm are a little better than the ones computed by the EH algorithm and the running times of both algorithms are similar, despite the time complexity of our algorithm being higher

Background

Results and discussion

Conclusions