WGDdetector: a pipeline for detecting whole genome duplication events using the genome or transcriptome annotations

Yongzhi Yang,Yongshuai Sun,Zhiqiang Lu,Ying Li,Qiao Chen

doi:10.1186/s12859-019-2670-3

Abstract

BackgroundWith the availability of well-assembled genomes of a growing number of organisms, identifying the bioinformatic basis of whole genome duplication (WGD) is a growing field of genomics. The most extant software for detecting footprints of WGDs has been restricted to a well-assembled genome. However, the massive poor quality genomes and the more accessible transcriptomes have been largely ignored, and in theoretically they are also likely to contribute to detect WGD using dS based method. Here, to resolve these problems, we have designed a universal and simple technical tool WGDdetector for detecting WGDs using either genome or transcriptome annotations in different organisms based on the widely used dS based method.ResultsWe have constructed WGDdetector pipeline that integrates all analyses including gene family constructing, dS estimating and phasing, and outputting the dS values of each paralogs pairs processed with only one command. We further chose four species (Arabidopsis thaliana, Juglans regia, Populus trichocarpa and Xenopus laevis) representing herb, wood and animal, to test its practicability. Our final results showed a high degree of accuracy with the previous studies using both genome and transcriptome data.ConclusionWGDdetector is not only reliable and stable for genome data, but also a new way to using the transcriptome data to obtain the correct dS distribution for detecting WGD. The source code is freely available, and is implemented in Windows and Linux operation system.

Highlights

With the availability of well-assembled genomes of a growing number of organisms, identifying the bioinformatic basis of whole genome duplication (WGD) is a growing field of genomics
Four organisms’ genome or/and transcriptome datasets were selected to evaluate the performance of WGDdetector, including three plants (Arabidopsis thaliana, Juglans regia and Populus trichocarpa) and one frog (Xenopus laevis) (Table 1 and Additional file 1: Table S1)
A total of 27,301, 32,436, 39,410 and 41,073 genes satisfied our criteria in A. thaliana, J. regia, P. trichocarpa and X. laevis, respectively: retaining the longest coding sequence (CDS) for each gene, removing CDS with premature stop codons and those protein sequences < 50 amino acids (AA)

Summary

Introduction

With the availability of well-assembled genomes of a growing number of organisms, identifying the bioinformatic basis of whole genome duplication (WGD) is a growing field of genomics. The massive poor quality genomes and the more accessible transcriptomes have been largely ignored, and in theoretically they are likely to contribute to detect WGD using dS based method. With a growing number of published draft genomes, two other methods based on phylogenetics [4, 16] and distribution of pairwise paralogs synonymous substitutions per synonymous site (dS) are more suitable [17, 18]. For the former, the WGDs are estimated through the gene count data where the number of gene copies in various gene families across a

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Feb 13, 2019
Citations: 26	License type: open-access

R Discovery Prime

R Discovery Prime

WGDdetector: a pipeline for detecting whole genome duplication events using the genome or transcriptome annotations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Contrasting patterns of evolution following whole genome versus tandem duplication events inPopulus
Eli Rodgers-Melnick ... Gancho T Slavov
Genome Research | VOL. 22
Eli Rodgers-Melnick, et. al.Eli Rodgers-Melnick ... Gancho T Slavov
05 Oct 2011
Genome Research | VOL. 22

Evolutionary history and functional divergence of the cytochrome P450 gene superfamily between Arabidopsis thaliana and Brassica species uncover effects of whole genome and tandem duplications
Jingyin Yu ... Xiurong Zhang
BMC Genomics | VOL. 18
Jingyin Yu, et. al.Jingyin Yu ... Xiurong Zhang
18 Sep 2017
BMC Genomics | VOL. 18

Phylogenetic placement of whole genome duplications in yeasts through quantitative analysis of hierarchical orthologous groups
Samuel Moix ... Natasha Glover
F1000Research | VOL. 12
Samuel Moix, et. al.Samuel Moix ... Natasha Glover
12 Apr 2023
F1000Research | VOL. 12

Evolutionary Dynamics and Functional Specialization of Plant Paralogs Formed by Whole and Small-Scale Genome Duplications
Lorenzo Carretero-Paulet ... Mario A Fares
Molecular Biology and Evolution | VOL. 29
Lorenzo Carretero-Paulet, et. al.Lorenzo Carretero-Paulet ... Mario A Fares
13 Jul 2012
Molecular Biology and Evolution | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

WGDdetector: a pipeline for detecting whole genome duplication events using the genome or transcriptome annotations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics