Abstract

Sugarcane (Saccharum spp. hybrids) is a leading industrial crop in tropical and subtropical regions worldwide. More recently, sugarcane has been selected as a key feedstock for biofuels due to its rapid growth, high fiber content and favorable energy input/output ratio. Breeding sugarcane varieties with biomass for efficient conversion to biofuels can be optimized by understanding the genetic control of biomass composition. However, the genetic analysis of these traits is hindered by the genomic complexity, and the limited availability of a reference genome. The aims of this project were: the development of a high-throughput profiling method for rapid screening of the key biomass traits in a sugarcane population; the construction of a new full-length transcriptome reference database; and the identification of transcripts associated with sugar and fiber accumulation in sugarcane. For the screening of genotypes, newly developed predictive models employing near-infrared (NIR) spectral analysis, coupled with the high performance liquid chromatography (HPLC), were shown to allow high-throughput profiling of major components in the fiber and sugar fractions in sugarcane biomass. Contrasting genotypes of low fiber and high fiber (minimum of ~29% and maximum of 61% total dry biomass) were identified amongst 331 samples from 186 sugarcane genotypes. The population studied exhibited a wide range of fiber/sugar ratio, from 0.4 (as low as that of the typical commercial sugarcane variety) to 2.2 (similar to that of energy-cane). In addition, the lignin content (the central factor in the biomass recalcitrance) ranged from 6 to 14% of the total dry biomass. To aid genotyping, a new sugarcane transcriptome (termed as SUGIT database) was constructed using PacBio full-length isoform sequencing (Iso-Seq), and a cDNA library derived from 22 diverse sugarcane genotypes, of the key tissues (leaf, internode and root), at different developmental stages (from immature to mature). Comparative analysis showed that this new SUGIT database included more full-length transcripts, longer predicted transcripts, and higher average length of the largest 1,000 proteins, compared to a de novo assembly from Illumina RNA-Seq short-read data from the same sample set. The annotation suggested that the majority (~94%) of the SUGIT database was from coding RNAs, while a very small proportion (~2%) could be long non-coding RNAs. About 70-82% of the RNA-Seq reads from different tissues mapped back to the SUGIT database, suggesting that it represented well the targeted tissues, while about 69% of this database was aligned with the sorghum genome, confirming the high conservation of orthologs in the genic regions of the two genomes. Applying the SUGIT database to differential expression analysis (FDR, false discovery rate corrected p-value <0.05), 1,649 transcript isoforms were identified as being differentially expressed between the young and mature tissues in the sugarcane plant, while 555 transcript isoforms were differentially expressed between the high and low fiber genotype groups. The differentially expressed transcripts included those involved in the carbon partitioning between the cell-wall components and sugars, cell function, hormone metabolism, transcription factors, disease/stress resistance, and development. Taken together, the new NIR- and HPLC-based method evaluated in this thesis allowed the rapid profiling of a large number of sugarcane biomass samples. The SUGIT database facilitated the analysis of differential gene expression at the transcriptional level, defined different full-length isoforms, and predicted transcripts that could be used to improve the sugarcane gene models. Finally, the study identified the candidate transcript isoforms that regulate the accumulation of biomass major components (sugars and fiber) at transcriptional level in sugarcane.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call