A composite genome approach to identify phylogenetically informative data from next-generation sequencing.

Rachel S Schwartz,Kelly M Harkins,Reed A Cartwright,Anne C Stone

doi:10.1186/s12859-015-0632-y

Rachel S Schwartz, Kelly M Harkins + Show 2 more

Open Access

https://doi.org/10.1186/s12859-015-0632-y

Copy DOI

Abstract

BackgroundImprovements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.ResultsFor simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.ConclusionsSISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0632-y) contains supplementary material, which is available to authorized users.

Highlights

Improvements in sequencing technology allow easy acquisition of large datasets; analyzing these data for phylogenetics can be challenging
We demonstrate that Site identification from short read sequences (SISRS) provides high quality phylogenetic datasets across a range of simulated and empirical data
The number of potentially informative sites identified using SISRS increased with increased coverage (Fig. 3)

Summary

Introduction

Improvements in sequencing technology allow easy acquisition of large datasets; analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation. Phylogenetic studies relied on tens of loci (at most) from the genome to determine evolutionary relationships [1, 2]. These datasets often had insufficient information to provide strong support for all the relationships of interest [3]. Even given a reference genome, homologous loci may not be recoverable for species distantly related to the reference [13]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jun 11, 2015
Citations: 61	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A composite genome approach to identify phylogenetically informative data from next-generation sequencing.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction
Peng Zeng ... Tinggan Zhou
Chinese Medicine | VOL. 17
Peng Zeng, et. al.Peng Zeng ... Tinggan Zhou
09 Aug 2022
Chinese Medicine | VOL. 17

Integration of Alignment and Phylogeny in the Whole-Genome Era

-

18 Jun 2015
18 Jun 2015

Sequence assembly using next generation sequencing data--challenges and solutions.
Francis Y L Chin ... Henry C M Leung
Science China Life Sciences | VOL. 57
Francis Y L Chin, et. al.Francis Y L Chin ... Henry C M Leung
17 Oct 2014
Science China Life Sciences | VOL. 57

ACMGA: a reference-free multiple-genome alignment pipeline for plant species
Huafeng Zhou ... Baoxing Song
BMC Genomics | VOL. 25
Huafeng Zhou, et. al.Huafeng Zhou ... Baoxing Song
25 May 2024
BMC Genomics | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A composite genome approach to identify phylogenetically informative data from next-generation sequencing.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics