Abstract
The sequencing, annotation and analysis of complete mitochondrial genomes is an important research tool in phylogeny and evolution. Starting with the primary sequence, genes/features are generally annotated automatically to obtain preliminary annotations in the form of a feature table. Further manual curation in a graphic alignment editor is nevertheless necessary to revise annotations. As such, the automatically generated feature table is invalidated and has to be modified manually before submission to data banks. We developed aln2tbl.py, a python script that recreates a feature table from a manually refined alignment of genes mapped on the mitochondrial genome in fasta format. The feature table is populated with notes and annotations specific to mitochondrial genomes. The table can be used to create a sqn file to be submitted directly to data banks. In summary, our scripts fills one gap in the available toolbox and, combined with other software, allows the automation of the entire process, from primary sequence to annotated genome submission, even if a manual curation step is conducted in a visual sequence editor.
Highlights
The collection of complete mitochondrial genome sequences from a genome project at low coverage is straightforward due to advances in high-throughput sequencing
The script can be run using the following command: aln2tbl.py -f assembly_file.fas -g forward_genes_file.txt -c number_genetic_code > feature_table_file.tbl. This assumes python3 is the default interpreter in the environment /usr/bin/env python3. If this is not the case, or multiple python installations are available, the full path to python3 interpreter can be added to the command line (e.g. /usr/bin/python3 aln2tbl.py)
Manual editing in a visual sequence editor invalidates the automatically produced feature table
Summary
The collection of complete mitochondrial genome sequences from a genome project at low coverage is straightforward due to advances in high-throughput sequencing. Several bioinformatic pipelines were developed to automatically annotate the 37 genes generally encoded in animal mitogenomes, with MITOS (Bernt et al 2013) and MITOS2 (Donath et al 2019) being a popular option. This pipeline puts together BLAST searches based on sequence identity and hidden Markov models (HMM) to annotate protein and ribosomal coding genes, and covariance models based on cloverleaf-like structures and other relaxed models to annotate tRNA coding genes. The feature table used by DDBJ/ENA/GenBank (Definition Version 10.9 November 2019) includes all gene annotations as a five-column, tab-delimited table of feature locations and qualifiers
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have