Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows

Othman Al-Dossary,Agnelo Furtado,Ardashir Kharabianmasouleh,Bader Alsubaie,Ibrahim Al-Mssallem,Robert J Henry

doi:10.1186/s13007-023-01091-1

Abstract

BackgroundLong read sequencing allows the analysis of full-length transcripts in plants without the challenges of reliable transcriptome assembly. Long read sequencing of transcripts from plant genomes has often utilized sized transcript libraries. However, the value of including libraries of differing sizes has not been established.MethodsA comprehensive transcriptome of the leaves of Jojoba (Simmondsia chinensis) was generated from two different PacBio library preparations: standard workflow (SW) and long workflow (LW).ResultsThe importance of using both transcript groups in the analysis was demonstrated by the high proportion of unique sequences (74.6%) that were not shared between the groups. A total of 37.8% longer transcripts were only detected in the long dataset. The completeness of the combined transcriptome was indicated by the presence of 98.7% of genes predicted in the jojoba male reference genome. The high coverage of the transcriptome was further confirmed by BUSCO analysis showing the presence of 96.9% of the genes from the core viridiplantae_odb10 lineage. The high-quality isoforms post Cd-Hit merged dataset of the two workflows had a total of 167,866 isoforms. Most of the transcript isoforms were protein-coding sequences (71.7%) containing open reading frames (ORFs) ≥ 100 amino acids (aa). Alternative splicing and intron retention were the basis of most transcript diversity when analysed at the whole genome level and by specific analysis of the apetala2 gene families.ConclusionThis suggests the need to specifically target the capture of longer transcripts to provide more comprehensive genome coverage in plant transcriptome analysis and reveal the high level of alternative splicing.

Full Text