AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data.

Guilherme Augusto Maia,Edmundo Carlos Grisard,Vilmar Benetti Filho,Glauber Wagner,Tatiany Aparecida Teixeira Soratto,Renato Simões Moreira,Eric Kazuo Kawagoe

doi:10.3389/fgene.2022.1020100

Abstract

Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experimental data towards validation of in silico prediction of gene function is of utmost relevance. Here, we present a computational workflow named AnnotaPipeline that integrates distinct software and data types on a proteogenomic approach to annotate and validate predicted features in genomic sequences. Based on FASTA (i) nucleotide or (ii) protein sequences or (iii) structural annotation files (GFF3), users can input FASTQ RNA-seq data, MS/MS data from mzXML or similar formats, as the pipeline uses both transcriptomic and proteomic information to corroborate annotations and validate gene prediction, providing transcription and expression evidence for functional annotation. Reannotation of the available Arabidopsis thaliana, Caenorhabditis elegans, Candida albicans, Trypanosoma cruzi, and Trypanosoma rangeli genomes was performed using the AnnotaPipeline, resulting in a higher proportion of annotated proteins and a reduced proportion of hypothetical proteins when compared to the annotations publicly available for these organisms. AnnotaPipeline is a Unix-based pipeline developed using Python and is available at: https://github.com/bioinformatics-ufsc/AnnotaPipeline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Genetics	Publication Date: Nov 22, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes.
Y M Suvorova ... E V Korotkov
DNA Research | VOL. 26
Y M Suvorova, et. al.Y M Suvorova ... E V Korotkov
04 Feb 2019
DNA Research | VOL. 26

Linking the evolution of plant transporters to their functions
Heven Sze ... Markus Geisler
Frontiers in Plant Science | VOL. 4
Heven Sze, et. al.Heven Sze ... Markus Geisler
01 Jan 2014
Frontiers in Plant Science | VOL. 4

Arabidopsis enters the post-sequencing era
Matthew R Willmann
Trends in Plant Science | VOL. 6
Matthew R WillmannMatthew R Willmann
01 Feb 2001
Trends in Plant Science | VOL. 6

Arabidopsis enters the post-sequencing era
Matthew R Willmann
Trends in Genetics | VOL. 17
Matthew R WillmannMatthew R Willmann
01 Feb 2001
Trends in Genetics | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Genetics