Abstract

The process of protein modeling usually involves the production of a variety of structures requiring efficient tools for structure model comparison attempting to choose the best three-dimensional (3D) structure. This paper introduces an alternative method for clustering 3D protein models that, instead of using attributes related to structural alignment to group the data, use quality-attributes of those models to represent and cluster them. This method stands out by removing the need to define a priori a base model for structural alignments. Even so, it is possible to present the most representative structure in each cluster, which is useful for docking or molecular dynamics studies. All the results were statistically analyzed and compared with decisions made by professionals to validate the proposed algorithm. The experiment simulated a usual protein comparative modeling process for different CATH classifications. The calculated variance levels after the dimensional reduction validate the workflow for different protein chain sizes. All the molecular descriptors for the input files are calculated by MHOLline 2.0, an online scientific workflow for studies on bioinformatics and computational biology, available for free on www.mholline2.lncc.br, or hand made using specific programs (e.g., MODELLER, PROCHECK) and adjusting the data to the template specified in this document. The Quality-Model Clustering Tool (QMC) and the data set used in this work are available for download on the git repository github.com/ruanmedina/Quality-Model-Clustering.

Highlights

  • The knowledge of the three-dimensional (3D) structure of proteins is essential to study diseases such as parasitoses, viruses, and cancer (LEE; FREDDOLINO; ZHANG, 2017)

  • We have already described some of the Quality-Model Clustering Tool (QMC) applicability and how flexible it can be by leaving the user free to choose the best parameters for the situation, and suggesting default ones for general studies

  • From using this tool? Figure 2 shows an overview of a general modeling process for the protein 1F4H, which is a hydrolase of Escherichia coli organism

Read more

Summary

Introduction

The knowledge of the three-dimensional (3D) structure of proteins is essential to study diseases such as parasitoses, viruses, and cancer (LEE; FREDDOLINO; ZHANG, 2017). The 3D protein structure prediction (PSP) are often guided by computational experiments in many kinds of research, since experimental PSP remains costly and timeconsuming (VERLI, 2014). Comparative modeling is an example of a computational method for PSP (ESWAR et al, 2006). This method constructs 3D models using known structures of macromolecules as template. These structures are obtained from databases (e.g., PDB Protein Data Bank). Comparative modeling is highly dependent on the quality of templates and the evolutionary relationship between them and the modeled protein (CAPRILES et al, 2010). The process demands the production of a large number of conformations and requires several refinements and validation steps, making the final decision of the best models difficult to make (VERLI, 2014)

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.