Predicted Coding Sequences Research Articles

BackgroundDe novo transcriptome assembly of short transcribed fragments (transfrags) produced from sequencing-by-synthesis technologies often results in redundant datasets with differing levels of unassembled, partially assembled or mis-assembled transcripts. Post-assembly processing intended to reduce redundancy typically involves reassembly or clustering of assembled sequences. However, these approaches are mostly based on common word heuristics and often create clusters of biologically unrelated sequences, resulting in loss of unique transfrags annotations and propagation of mis-assemblies.ResultsHere, we propose a structured framework that consists of a few steps in pipeline architecture for Inferring Functionally Relevant Assembly-derived Transcripts (IFRAT). IFRAT combines 1) removal of identical subsequences, 2) error tolerant CDS prediction, 3) identification of coding potential, and 4) complements BLAST with a multiple domain architecture annotation that reduces non-specific domain annotation. We demonstrate that independent of the assembler, IFRAT selects bona fide transfrags (with CDS and coding potential) from the transcriptome assembly of a model organism without relying on post-assembly clustering or reassembly. The robustness of IFRAT is inferred on RNA-Seq data of Neurospora crassa assembled using de Bruijn graph-based assemblers, in single (Trinity and Oases-25) and multiple (Oases-Merge and additive or pooled) k-mer modes. Single k-mer assemblies contained fewer transfrags compared to the multiple k-mer assemblies. However, Trinity identified a comparable number of predicted coding sequence and gene loci to Oases pooled assembly. IFRAT selects bona fide transfrags representing over 94% of cumulative BLAST-derived functional annotations of the unfiltered assemblies. Between 4-6% are lost when orphan transfrags are excluded and this represents only a tiny fraction of annotation derived from functional transference by sequence similarity. The median length of bona fide transfrags ranged from 1.5kb (Trinity) to 2kb (Oases), which is consistent with the average coding sequence length in fungi. The fraction of transfrags that could be associated with gene ontology terms ranged from 33-50%, which is also high for domain based annotation. We showed that unselected transfrags were mostly truncated and represent sequences from intronic, untranslated (5′ and 3′) regions and non-coding gene loci.ConclusionsIFRAT simplifies post-assembly processing providing a reference transcriptome enriched with functionally relevant assembly-derived transcripts for non-model organism.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0492-5) contains supplementary material, which is available to authorized users.

BackgroundSome Pseudomonas strains function as predominant plant growth-promoting rhizobacteria (PGPR). Within this group, Pseudomonas chlororaphis and Pseudomonas fluorescens are non-pathogenic biocontrol agents, and some Pseudomonas aeruginosa and Pseudomonas stutzeri strains are PGPR. P. chlororaphis GP72 is a plant growth-promoting rhizobacterium with a fully sequenced genome. We conducted a genomic analysis comparing GP72 with three other pseudomonad PGPR: P. fluorescens Pf-5, P. aeruginosa M18, and the nitrogen-fixing strain P. stutzeri A1501. Our aim was to identify the similarities and differences among these strains using a comparative genomic approach to clarify the mechanisms of plant growth-promoting activity.ResultsThe genome sizes of GP72, Pf-5, M18, and A1501 ranged from 4.6 to 7.1 M, and the number of protein-coding genes varied among the four species. Clusters of Orthologous Groups (COGs) analysis assigned functions to predicted proteins. The COGs distributions were similar among the four species. However, the percentage of genes encoding transposases and their inactivated derivatives (COG L) was 1.33% of the total genes with COGs classifications in A1501, 0.21% in GP72, 0.02% in Pf-5, and 0.11% in M18. A phylogenetic analysis indicated that GP72 and Pf-5 were the most closely related strains, consistent with the genome alignment results. Comparisons of predicted coding sequences (CDSs) between GP72 and Pf-5 revealed 3544 conserved genes. There were fewer conserved genes when GP72 CDSs were compared with those of A1501 and M18. Comparisons among the four Pseudomonas species revealed 603 conserved genes in GP72, illustrating common plant growth-promoting traits shared among these PGPR. Conserved genes were related to catabolism, transport of plant-derived compounds, stress resistance, and rhizosphere colonization. Some strain-specific CDSs were related to different kinds of biocontrol activities or plant growth promotion. The GP72 genome contained the cus operon (related to heavy metal resistance) and a gene cluster involved in type IV pilus biosynthesis, which confers adhesion ability.ConclusionsComparative genomic analysis of four representative PGPR revealed some conserved regions, indicating common characteristics (metabolism of plant-derived compounds, heavy metal resistance, and rhizosphere colonization) among these pseudomonad PGPR. Genomic regions specific to each strain provide clues to its lifestyle, ecological adaptation, and physiological role in the rhizosphere.

Predicted Coding Sequences Research Articles

Related Topics

Articles published on Predicted Coding Sequences

Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms.

Comparative genomic analysis and phenazine production of Pseudomonas chlororaphis, a plant growth-promoting rhizobacterium.

Genome sequence and description of the mosquitocidal and heavy metal tolerant strain Lysinibacillus sphaericus CBAM5.

The identification of novel and differentially expressed apple-tree genes under low-temperature stress using high-throughput Illumina sequencing.

Hepatic patatin-like phospholipase domain-containing protein 3 sequence, single nucleotide polymorphism presence, protein confirmation, and responsiveness to energy balance in dairy cows

Patterns of simple sequence repeats in cultivated blueberries (Vaccinium section Cyanococcus spp.) and their use in revealing genetic diversity and population structure

Gene Content and Diversity of the Loci Encoding Biosynthesis of Capsular Polysaccharides of the 15 Serovar Reference Strains of Haemophilus parasuis

Comparative genomic analysis of four representative plant growth-promoting rhizobacteria in Pseudomonas

MetaSAMS—A novel software platform for taxonomic classification, functional annotation and comparative analysis of metagenome datasets

Characterization and differential expression analysis of complete coding sequences of Vitis vinifera L. sirtuin genes

Genome-Wide Definition of the SigF Regulon in Mycobacterium tuberculosis

Comparative analysis of a plant pseudoautosomal region (PAR) in Silene latifolia with the corresponding S. vulgaris autosome

Genome-wide Association Study Identifies Four Genetic Loci Associated with Thyroid Volume and Goiter Risk

Purification to homogeneity and characterization of nonproteolyzed potato (Solanum tuberosum) tuber hexokinase 1

Complete genome sequence of Crohn's disease-associated adherent-invasive E. coli strain LF82.

Genome Analysis of Moraxella catarrhalis Strain RH4, a Human Respiratory Tract Pathogen

Comparative genomic analysis of 1047 completely sequenced cDNAs from an Arabidopsis-related model halophyte, Thellungiella halophila

Genomic Analysis of an Attenuated Chlamydia abortus Live Vaccine Strain Reveals Defects in Central Metabolism and Surface Proteins

Co-evolution of genomes and plasmids within Chlamydia trachomatis and the emergence in Sweden of a new variant strain.

Identification of SFR6, a key component in cold acclimation acting post‐translationally on CBF function

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Predicted Coding Sequences Research Articles

Related Topics

Articles published on Predicted Coding Sequences

Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms.

Comparative genomic analysis and phenazine production of Pseudomonas chlororaphis, a plant growth-promoting rhizobacterium.

Genome sequence and description of the mosquitocidal and heavy metal tolerant strain Lysinibacillus sphaericus CBAM5.

The identification of novel and differentially expressed apple-tree genes under low-temperature stress using high-throughput Illumina sequencing.

Hepatic patatin-like phospholipase domain-containing protein 3 sequence, single nucleotide polymorphism presence, protein confirmation, and responsiveness to energy balance in dairy cows

Patterns of simple sequence repeats in cultivated blueberries (Vaccinium section Cyanococcus spp.) and their use in revealing genetic diversity and population structure

Gene Content and Diversity of the Loci Encoding Biosynthesis of Capsular Polysaccharides of the 15 Serovar Reference Strains of Haemophilus parasuis

Comparative genomic analysis of four representative plant growth-promoting rhizobacteria in Pseudomonas

MetaSAMS—A novel software platform for taxonomic classification, functional annotation and comparative analysis of metagenome datasets

Characterization and differential expression analysis of complete coding sequences of Vitis vinifera L. sirtuin genes

Genome-Wide Definition of the SigF Regulon in Mycobacterium tuberculosis

Comparative analysis of a plant pseudoautosomal region (PAR) in Silene latifolia with the corresponding S. vulgaris autosome

Genome-wide Association Study Identifies Four Genetic Loci Associated with Thyroid Volume and Goiter Risk

Purification to homogeneity and characterization of nonproteolyzed potato (Solanum tuberosum) tuber hexokinase 1

Complete genome sequence of Crohn's disease-associated adherent-invasive E. coli strain LF82.

Genome Analysis of Moraxella catarrhalis Strain RH4, a Human Respiratory Tract Pathogen

Comparative genomic analysis of 1047 completely sequenced cDNAs from an Arabidopsis-related model halophyte, Thellungiella halophila

Genomic Analysis of an Attenuated Chlamydia abortus Live Vaccine Strain Reveals Defects in Central Metabolism and Surface Proteins

Co-evolution of genomes and plasmids within Chlamydia trachomatis and the emergence in Sweden of a new variant strain.

Identification of SFR6, a key component in cold acclimation acting post‐translationally on CBF function