Abstract

Advances in sequencing technology have made it possible to produce large multi-locus datasets required for species tree analyses. One challenge with constructing high throughput sequencing datasets, however, is that missing information is propagated at different steps in the sequence preparation process. To date, species tree studies have focused on filtering and removing errors that occur at particular loci. Given the way that high throughput sequencing datasets are constructed, however, large amounts of error or ambiguity may also manifest across individuals. Here we use a novel tree-based multivariate clustering method to identify and remove individuals with low phylogenetic signal in a nuclear sequence capture dataset for the Iochrominae clade (Solanaceae). Our results suggest that the low quality tips are the result of the library preparation process (e.g. unequal pooling) rather than poor capture due to phylogenetic distance from the reference species. After implementing the clustering approach and removing low quality tips, we construct an Iochrominae species tree that resolves a number of unknown relationships. We propose this pipeline as a valuable tool for species tree reconstruction with phylogenomic datasets containing variable levels of missing data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.