Abstract
Abstract Phylogenetic inference is mainly based on sequence analysis and requires reliable alignments. This can be challenging, especially when sequences are highly divergent. In this context, the use of three-dimensional protein structures is a promising alternative. In a recent study, we introduced an original topological data analysis method based on persistent homology to estimate the evolutionary distances from structures. The method was successfully tested on 518 protein families representing 22,940 predicted structures. However, as anticipated, the reliability of the estimated evolutionary distances was impacted by the quality of the predicted structures and the presence of indels in the proteins. This paper introduces a new topological descriptor, called bio-topological marker (BTM), which provides a more faithful description of the structures, as well as a topological analysis for estimating evolutionary distances from BTMs and a weight-filtering method adapted to protein structures. These new developments significantly improve the estimation of evolutionary distances and phylogenies inferred from structures.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have