MEGGASENSE - The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses.

Ranko Gaćeša ,Daslav Hranueli,Paul F Long,John Cullum,Jurica Žučko ,Janko Diminić ,Sólveig K Pétursdóttir ,Ólafur H Friðjónsson ,Elísabet Eik Guðmundsdóttir ,Guðmundur Ó Hreggviðsson ,Antonio Starčević

doi:10.17113/ftb.55.02.17.4749

Abstract

The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya. The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel 'functional' assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.

Highlights

Falling costs of generation sequencing have made de novo genome and metagenome sequencing widely avail able
The default functional analysis for databases generated by MEGGASENSE is derived from the KEGG database
The analyses described above are incorporated as a default in metagenome databases generated by MEGGASENSE

Summary

Introduction

Falling costs of generation sequencing have made de novo genome and metagenome sequencing widely avail able. Bioinformatics offers many tools to analyse the sequences, and the identification of protein-coding regions and assignment of function are the major aim in most projects. There are many tools to try to assign function to such proteins. A general BLAST database such as GenBank (3) consists mainly of uncurated entries, which will often contain misleading data for functional assignment. The SEED database (4) contains collections of protein sequences grouped by function and has been used for BLAST searches to find hits corresponding to in silico translation of the metagenomic sequences. In order to present functional information about the whole genome or metagenome effectively, it is necessary to have a suitable data structure. BLAST searches against the KEGG orthologues are a useful way of assigning function to new sequences

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Food Technology and Biotechnology	Publication Date: Jan 1, 2017
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

MEGGASENSE - The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Food Technology and Biotechnology

Lead the way for us

Similar Papers

Efam: an expanded, metaproteome-supported HMM profile database of viral protein families.
Ahmed A Zayed ... Ann C Gregory
Bioinformatics | VOL. 37
Ahmed A Zayed, et. al.Ahmed A Zayed ... Ann C Gregory
16 Jun 2021
Bioinformatics | VOL. 37

Predicting MoRFs in protein sequences using HMM profiles.
Ronesh Sharma ... Shiu Kumar
BMC Bioinformatics | VOL. 17
Ronesh Sharma, et. al.Ronesh Sharma ... Shiu Kumar
01 Dec 2016
BMC Bioinformatics | VOL. 17

Insights into the evolution of enzyme substrate promiscuity after the discovery of (βα)₈ isomerase evolutionary intermediates from a diverse metagenome.
Lianet Noda-García ... Mauricio Carrillo-Tripp
BMC evolutionary biology | VOL. 15
Lianet Noda-García, et. al.Lianet Noda-García ... Mauricio Carrillo-Tripp
10 Jun 2015
BMC evolutionary biology | VOL. 15

Hybrid computational models for protein sequence analysis and secondary structure prediction

-

09 Jan 2017
09 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MEGGASENSE - The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Food Technology and Biotechnology