Abstract

Genomes and their proteins can be analyzed by different perspectives, with different goals and applicability of results in several areas of research. In bioinformatics and computational biology, it is common the use of multiple combinations of programs and databases to extract information from the raw data. Choosing the proper program, setting its parameters, filtering the input files, extracting information from output files, and creating scripts for automating tasks can be challenging. This work describes the MHOLline, an online scientific workflow that provides a set of modules that enables a broad large-scale analysis of proteins (e.g. whole genomes) in a few hours. The version 1.0 (released in 2010), was wholly reformulated, and the new version 2.0 is available at www.mholline2.lncc.br. This version presents new features, modules, interface, code optimization, runtime reduction, more security, a new results visualization interface, and an automatic and user-friendly refinement page

Highlights

  • Genome analysis may provide information about each protein’s role in organisms, which proteins play similar or analogous biological functions, and identify proteins intrinsic to a particular specie

  • We present the second version of this workflow, with the addition of new softwares, proprietary tools for protein modeling analysis and refinement, and administration environment

  • The interface provides information regarding the protein model and templates (Figure 3), such as the secondary structure of the sequence model predicted with PSIPRED and the template files assigned by DSSP and transmembrane regions found in the sequence model using TMHMM

Read more

Summary

INTRODUCTION

Genome analysis may provide information about each protein’s role in organisms, which proteins play similar or analogous biological functions, and identify proteins intrinsic to a particular specie. 1.1 What’s New in MHOLline 2.0 The first version of MHOLline (CAPRILES et al, 2010) presented the following modules: (i) the BLAST program (ALTSCHUL et al, 1990), used to perform the local alignment of the submitted sequences against the Protein Data Bank (PDB) (BERMAN et al, 2000); (ii) the BATS tool, developed to classify the alignments in four groups (Table 1) based the E-value and identity from BLAST, and Length Variation Index (LVI) which is the MHOLline’s concept of coverage (LVI ≤ 0.1 is equivalent to coverage ≥ 90%); (iii) the FILTERS tool, created to sort G2 proteins (proteins that can be modeled by comparative modeling technique) into seven quality groups (Table 2); (iv) the ECNGet tool, developed to capture at least one Enzyme Commission (EC) number for each sequence in G2 group, whose reference protein has at least one known enzymatic function; (v) the MODELLER software (WEBB; SALI, 2016) that constructs the 3D models; (vi) the PROCHECK software (LASKOWSKI et al, 1993), used to produce Ramachandran plots; and (vii) the HMMTOP software (TUSNÁDY; SIMON, 2001), used to identify transmembrane regions in proteins.

RESULTS AND DISCUSSION
CONCLUSIONS
PERSPECTIVES

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.