Abstract

The interconversion of sequences that constitute the genome and the proteome is becoming increasingly important due to the generation of large amounts of DNA sequence data. Following mapping of DNA segments to the genome, one fundamentally important task is to find the amino acid sequences which are coded within a list of genomic sections. Conversely, given a series of protein segments, an important task is to find the genomic loci which code for a list of protein regions. To perform these tasks on a region by region basis is extremely laborious when a large number of regions are being studied. We have therefore implemented an R package geno2proteo which performs the two mapping tasks and subsequent sequence retrieval in a batch fashion. In order to make the tool more accessible to users, we have created a web interface of the R package which allows the users to perform the mapping tasks by going to the web page http://sharrocksresources.manchester.ac.uk/tofigaps and using the web service.

Highlights

  • We have the complete genome sequences of many organisms including humans which act as reference datasets for other genome-wide studies

  • The R package geno2proteo presented in this paper implements the two-way mapping between genome and proteome; namely, given a genome and the gene annotations, it finds the amino acid sequences coded by any given genomic regions and finds the genomic regions coding for any given protein regions

  • We created an R package geno2proteo, a software dedicated to mapping sequences from any genomic and protein coordinates to reference DNA and protein sequences

Read more

Summary

Introduction

We have the complete genome sequences of many organisms including humans which act as reference datasets for other genome-wide studies. Finding the protein sequence of a coding region can be done by using the two web sites, UCSC genome browser [1] and Ensembl [2]. It is not straightforward to obtain the amino acid sequence of any genomic coding region from the Ensembl web site or database. The R package geno2proteo presented in this paper implements the two-way mapping between genome and proteome; namely, given a genome and the gene annotations, it finds the amino acid sequences coded by any given genomic regions and finds the genomic regions coding for any given protein regions. As a by-product, the R package geno2proteo provides functions for two more tasks; namely obtaining the DNA sequences of any genomic regions and the amino acid sequences of any protein regions.

The R Package Geno2proteo
The online tool ToFiGAPS
Application
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call