Abstract

Coffee is an important crop globally. Improving coffee quality is a primary breeding target that would benefit from understanding of trait inheritance and ancestral genetic information. However, comparatively little information exists on the genetics of quality and the relationships between wild and cultivated coffee genotypes, especially arabica coffee (Coffea arabica L.) which accounts for almost 60% of coffee production globally. The complex polyploidy genome and limited genomic information for C. arabica have impeded the progress of genetic studies. Arabica coffee is among the major organisms that do not have a reference genome. This project aimed to develop and utilise genomic resources in genetic studies of arabica coffee quality. The relationship between coffee species and the position of arabica within the Coffeeae tribe were investigated using complete chloroplast genome sequences to determine the value of these genetic resources in breeding. A draft genome sequence of arabica coffee was developed. Marker-trait associations were studied for caffeine and trigonelline, the two principal compounds known to be related to coffee quality, paving the way for developing new improved varieties with preferred levels of these compounds in the coffee beans.Chloroplast genomes of 16 coffee species were sequenced and their phylogeny constructed. Results support distinct Psilanthus and Coffea clades. It is likely that C. canephora is a hybrid that has a Psilanthus maternal genome but received much of its nuclear genome from Coffea. The maternal genomes of C. arabica and C. canephora are divergent. This result is in agreement with the fact that the chloroplast genome of arabica should be that of the maternal parent i.e. C. eugenioides. There were two species (C. humblotiana and C. tetragona) close to C. arabica and one species (P. ebracteolatus) close to C. canephora containing almost no caffeine. They could serve as important materials in arabica quality breeding and research.For the association study, 232 diverse arabica coffee accessions originating from 27 countries were harvested from the germplasm collection at CATIE (Tropical Agricultural Research and Higher Education Centre), Costa Rica. Substantial variation between genotypes was observed for bean morphology attributes. Non-volatiles including caffeine and trigonelline showed larger variation in range than was previously reported. Results of targeted analysis of 18 volatiles from 35 accessions also showed significant variation. No strong correlation was found between bean morphology and the levels of non-volatile or volatile compounds, implying that it is difficult to select for low or high non-volatile and volatile compounds based on bean physical characteristics. However, it also indicates that breeding for desirable combinations of traits (i.e. large bean size, low caffeine, high trigonelline, and favourable volatiles) is possible.The genome of the most popular arabica variety (K7) in Australia was sequenced. Genome assembly was performed using both Illumina short reads and PacBio long reads. Assembly was performed using a range of assembly tools resulting in 76,409 scaffolds with a scaffold N50 of 54,544 bp and a total scaffold length of 1,448 Mb. Validation of the genome assembly showed high completeness of the genome in which BWA analysis demonstrated that g 98% of the short reads mapped to the genome and g 93% were marked as properly paired. GMAP analysis indicated that g 99% of the CDS and transcriptome sequences mapped to the C. arabica draft genome and 89% of BUSCOs were present.n The assembled genome was annotated using AUGUSTUS and yielded 99,829 gene models. The assembly outcomes were used as reference for association analysis.Extreme-phenotype genome-wide association study (XP-GWAS) was performed to identify loci affecting the caffeine and trigonelline content of C. arabica beans.n DNA extracted from individuals with extreme phenotypes (high vs. low caffeine, and high vs. low trigonelline) was bulked based on biochemical analysis of the germplasm collection. Sequencing and mapping using the combined reference genomes of C. canephora (CC) and C. eugenioides (CE) identified 1,351 non-synonymous SNPs that distinguished the low- and high-caffeine bulks. Gene annotation analysis with Blast2GO revealed that these SNPs corresponding to 908 genes with 56 unique KEGG pathways and 49 unique enzymes. Based on KEGG pathway-based analysis, 40 caffeine-associated SNPs were discovered, among which nine SNPs were tightly associated with genes encoding enzymes involved in the conversion of substrates (i.e., SAM, xanthine and IMP) which participate in the caffeine biosynthesis pathways. Likewise, 1,060 non-synonymous SNPs were found to distinguish the low- and high-trigonelline bulks. They were associated with 719 genes involved in 61 unique KEGG pathways and 51 unique enzymes. The KEGG pathway-based analysis revealed 24 trigonelline-associated SNPs tightly linked to genes encoding enzymes involved in the conversion of substrates (i.e. SAM, L-tryptophan) which participate in the trigonelline biosynthesis pathways. Association analysis using the K7 arabica reference genome identified several additional SNPs linked to genes encoding enzymes involved in caffeine and trigonelline synthesis pathways. These SNPs could be useful targets for further functional validation and subsequent application in arabica quality breeding.n

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.