Abstract

AbstractAnalyzing of sequences similarities is the first and most important method used to find out the function of unknown nucleotides. Searching of homologs should be done carefully not to loose any important ones. Having thousands of results from various long-read sequencing projects (ie. differentially expressed tags, genomic polymorphons or BAC ends), the by-hand ability to retrieve interesting (to our goal) similarities in hundreds of Blast results decreases rapidly. Decreasing the number of retrieved sequences by giving more stringency in e-value threshold or displaying less results could lead to false deductions. Functional genomics, proteomics and metabolomics could give us answers to the role of nucleotide sequences. It makes the need to annotate as much of the homologies as we can, to proper molecular function, biological process and cellular component (as its proposed by widely accepted Gene Ontology Consortium annotations or MapMan mappings by Max-Planc-Institute).To facilitate fast retrieval of interesting Blast homologies and making right deductions about the biological role of sequences, in big sequencing projects, the new Perl script BRAGOMAP was written. The program make use of some of BioPerl modules as well as the power of regex text-mining in the Perl itself.The script gives us the possibility to find interesting sequence similarities by using keywords and giving points for each one found. It collects all important information from the GenBank data and puts it in different columns of tab-delimited file for further use. If we were interested (for example) in flower differentiation genes we could use the keywords (flower, ovule, anther, etc.) and/or filter all the homologies isolated from flower tissues in a special development stage. We can also filter results by choosing similarities to interesting genes or protein products. This script retrieve also all standard information from the Blast and GenBank files as Description, ACC no., E-value, Similarity positions, Query Length, Percent of Similarity etc. Automatic GO and MapMan annotations are done by looking for genes, protein products and /or DB references in the proper mappings files. Here we present the usefulness of the script in analyzing sequence similarities and annotations mapping of 3855 BAC ends obtained from the HindIII BAC genomic library of cucumber (Cucumis sativus L., line B10).

Highlights

  • If we were interested in flower differentiation genes we could use the keywords and/or filter all the homologies sequences isolated from flower tissues in a special development stage

  • Automatic annotations are done by looking for genes, protein products and /or DB references in the proper mappings files (Table 3)

  • *** Number of all collected points *** 6

Read more

Summary

Introduction

Analysis of sequences similarities is the first and most important method used to find out the function of unknown nucleotides. Having thousands of results from various long-read sequencing projects Differentially expressed tags, genomic polymorphons or BAC ends), the by-hand ability to retrieve interesting (to our goal) similarities in hundreds of thousands of Blast results is practically not possible. Decreasing the number of retrieved sequences by giving more stringency in e-value threshold or displaying less results would lead to false deductions.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.