Abstract

Plant mitochondrial genomes have distinctive features compared to those of animals; namely, they are large and divergent, with sizes ranging from hundreds of thousands of to a few million bases. Recombination among repetitive regions is thought to produce similar structures that differ slightly, known as "multipartite structures," which contribute to different phenotypes. Although many reference plant mitochondrial genomes represent almost all the genes in mitochondria, the full spectrum of their structures remains largely unknown. The emergence of long-read sequencing technology is expected to yield this landscape; however, many studies aimed to assemble only one representative circular genome, because properly understanding multipartite structures using existing assemblers is not feasible. To elucidate multipartite structures, we leveraged the information in existing reference genomes and classified long reads according to their corresponding structures. We developed a method that exploits two classic algorithms, partial order alignment (POA) and the hidden Markov model (HMM) to construct a sensitive read classifier. This method enables us to represent a set of reads as a POA graph and analyze it using the HMM. We can then calculate the likelihood of a read occurring in a given cluster, resulting in an iterative clustering algorithm. For synthetic data, our proposed method reliably detected one variation site out of 9,000-bp synthetic long reads with a 15% sequencing-error rate and produced accurate clustering. It was also capable of clustering long reads from six very similar sequences containing only slight differences. For real data, we assembled putative multipartite structures of mitochondrial genomes of Arabidopsis thaliana from nine accessions sequenced using PacBio Sequel. The results indicated that there are recurrent and strain-specific structures in A. thaliana mitochondrial genomes.

Highlights

  • Since Lynn Margulis confirmed with substantial evidence that mitochondria originated from external bacteria [1], genomes of mitochondria, which are called mitogenomes, have been extensively investigated.We know that the mitogenomes of plants differ from those of animals

  • We developed a method that exploits two classic algorithms, partial order alignment (POA) and the hidden Markov model (HMM) to construct a sensitive read classifier

  • We explored two traditional models in sequence analysis; hidden Markov model and partial order alignment, which enable us to detect a single base variation among several thousand bases and output accurate clusters while managing with observation errors associated with long-read sequencing

Read more

Summary

Introduction

Since Lynn Margulis confirmed with substantial evidence that mitochondria originated from external bacteria [1], genomes of mitochondria, which are called mitogenomes, have been extensively investigated.We know that the mitogenomes of plants differ from those of animals. Plant mitogenomes comprise hundreds of thousands to millions of nucleotides [2]. Mitochondria have a gene that is responsible for cytoplasmic male sterility, a phenotype in which the individual cannot produce mature pollen [3]. Because of this unique effect on fertility, plant mitogenomes should be studied independently. Since the emergence of Illumina sequencing technology, the mitogenomes of many species (e.g., rice) have been assembled as circular contigs [5]. Many studies have determined the nucleotide sequences of plant mitogenomes, compared the resulting assemblies, and proposed evolutionary scenarios for current patterns [11], [16]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call