Chloroplasts are a common character in plants. The chloroplasts in each plant lineage have shaped their own genomes, plastomes, by structural changes and transferring many genes to nuclear genomes during plant evolution. Some plastid genes have introns that are mostly group II introns. This study aimed to get genomic and evolutionary insights on the plastomes from green algae to flowering plants. Plastomes of 115 species from green algae, bryophytes, pteridophytes (spore bearing vascular plants), gymnosperms, and angiosperms were mined from NCBI organelle genome database. Plastome structure, gene contents and GC contents were analyzed by the in-house developed Phyton code. Intronic features including presence/absence, length, intron phases were analyzed by manually in the annotated information in NCBI. The canonical quadripartite structures were retained in most plastomes except of a few plastomes that had lost an invert repeat (IR). Expansion or reduction or deletion of IRs resulted in the length variation of the plastomes. The number of protein coding genes ranged from 40 to 92 with an average 79.43 ± 5.84 per plastome and gene losses were apparent in specific lineages. The number of trn genes ranged from 13 to 33 with an average 21.19 ± 2.42 per plastome. Ribosomal RNA genes, rrn, were located in the IRs so that they were present in a duplicate except of the species that had lost one of the IR. GC contents were variable from 24.9 to 51.0% with an average 38.21 ± 3.27%, indicating bias to high AT contents. Plastid introns were present in 18 protein coding genes, six trn genes, and one rrn gene. Intron losses occurred among the orthologous genes in different plant lineages. The plastid introns were long compared with the nuclear introns, which might be related with the spliceosome nuclear introns and self-splicing group II plastid introns. The trnK-UUU intron contained the maturase encoding matK gene except in the chlorophyte algae and monilophyte ferns in which the trnK-UUU was lost, but matK retained. There were many annotation artefacts in the intron positions in the NCBI database. In the analysis of intron phases, phase 0 introns were more frequent than those of phase 2 and 3 introns. Phase polymorphism was observed in the introns of clpP which was derived from nucleotide insertion. Plastid trn introns were long compared to the archaeal or eukaryotic nuclear tRNA introns. Of the six plastid trn introns, one was at the D loop and other five were at the anticodon loop. The insertion sites were conserved among the trn genes in archaea, eukaryotic nuclear and plastid tRNA genes. Current study refurbrished the previous findings of structural variations, gene contents, and GC contents of the chloroplast genomes from green algae to flowering plants. The study also included some noble findings and discussions on the plastome introns including their length variations and phase variation. We also presented and corrected some false annotations on the introns in protein coding and tRNA genes in the genome database, which might be confirmed by the chloroplast transcriptome analysis in the future.
Read full abstract