Abstract
Statistical resampling methods are widely used for confidence interval placement and as a data perturbation technique for statistical inference and learning. An important assumption of popular resampling methods such as the standard bootstrap is that input observations are identically and independently distributed (i.i.d.). However, within the area of computational biology and bioinformatics, many different factors can contribute to intra-sequence dependence, such as recombination and other evolutionary processes governing sequence evolution. The SEquential RESampling ("SERES") framework was previously proposed to relax the simplifying assumption of i.i.d. input observations. SERES resampling takes the form of random walks on an input of either aligned or unaligned biomolecular sequences. This study introduces the first application of SERES random walks on aligned sequence inputs and is also the first to demonstrate the utility of SERES as a data perturbation technique to yield improved statistical estimates. We focus on the classical problem of recombination-aware local genealogical inference. We show in a simulation study that coupling SERES resampling and re-estimation with recHMM, a hidden Markov model-based method, produces local genealogical inferences with consistent and often large improvements in terms of topological accuracy. We further evaluate method performance using empirical HIV genome sequence datasets.
Highlights
S TATISTICAL resampling methods are widely used in science and engineering
(1) We propose the first application of SEquential RESampling (SERES) random walks on aligned sequences, whereas our earlier study focused on SERES random walks on unaligned sequences
This study introduced the first application of SERES random walks on aligned sequences
Summary
S TATISTICAL resampling methods are widely used in science and engineering. Among the many applications of resampling methods is calculating confidence intervals for statistical inference and learning [3], [4]. Another important application arises in the context of statistical inference and learning. Alongside model perturbation approaches such as dropout [23], statistical resampling can be seen as a form of data perturbation that can help to improve inference and learning accuracy [2].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.