Abstract
BackgroundThe advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy. Here we use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum using PacBio RNA-Seq.ResultsWe sequenced 192,888 circular consensus sequences (CCS) derived from cDNAs generated using the CloneTech SMARTer system. These SMARTer-SMRT libraries were normalized and size-selected providing a robust population of expressed structural genes for subsequent genome annotation. We demonstrate PacBio mRNA sequences based genome annotation improvement, compared to genome annotation using conventional sequencing-by-synthesis alone, by identifying 1609 (9.2%) new genes, extended the length of 3965 (26.7%) genes and increased the total genomic exon length by 1.9 Mb (12.4%). Non-coding sequence representation (primarily from UTRs based on dT reverse transcription priming) was particularly improved, increasing in total length by fifteen-fold, by increasing both the length and number of UTR exons. In addition, the UTR data provided by these CCS allowed for the identification of a novel SL2 splice leader sequence for A. ceylanicum and an increase in the number and proportion of functionally annotated genes. RNA-seq data also confirmed some of the newly annotated genes and gene features.ConclusionOverall, PacBio data has supported a significant improvement in gene annotation in this genome, and is an appealing alternative or complementary technique for genome annotation to the other transcript sequencing technologies.
Highlights
The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy
Our ‘original’ A. ceylanicum genome annotation (“ACOrig”, before the inclusion of PacBio data; BioProjectID # PRJNA72583, GenBank uploaded in May 2013), contained 16,026 predicted genes, and used 10,591 available A. ceylanicum EST sequences downloaded from the NCBI database
This single primer technique only amplifies 1st-strand cDNAs synthesized with both priming sites during the RT reaction, and the amplification products are typically > 500 bp minimizing the representation of smaller cDNA molecules [21]
Summary
The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy We use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum using PacBio RNA-Seq. Compared to conventional 454/Roche and/or Illumina sequencing platforms, the Pacific Biosciences’s (PacBio) much longer reads and improved accuracy using circular consensus sequences (CCS) are advantageous for sequencing cDNA libraries because i) each library read is from a single transcript molecule, ii) mRNA CCS lengths on PacBio exceed 1kbp, iii) the longer reads provide a unique opportunity to identify 5′ and 3′ boundaries or untranslated regions (UTRs), and iv) for each gene, To understand genome organization and genic content, whole genome shotgun approaches have been traditionally assembled from short read NGS technologies such as the 454/Roche and/or Illumina platform for many organisms including nematodes [1,2,3].
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have