Abstract
Protein-structure comparison (PSC) is an essential component of biomedical research as it impacts on, e.g., drug design, molecular docking, protein folding and structure prediction algorithms as well as being essential to the assessment of these predictions. Each of these applications, as well as many others where molecular comparison plays an important role, requires a different notion of similarity that naturally lead to the multicriteria PSC (MC-PSC) problem. Protein (Structure) Comparison, Knowledge, Similarity, and Information (ProCKSI) (www.procksi.org) provides algorithmic solutions for the MC-PSC problem by means of an enhanced structural comparison that relies on the principled application of information fusion to similarity assessments derived from multiple comparison methods. Current MC-PSC works well for moderately sized datasets and it is time consuming as it provides public service to multiple users. Many of the structural bioinformatics applications mentioned above would benefit from the ability to perform, for a dedicated user, thousands or tens of thousands of comparisons through multiple methods in real time, a capacity beyond our current technology. In this paper, we take a key step into that direction by means of a high-throughput distributed reimplementation of ProCKSI for very large datasets. The core of the proposed framework lies in the design of an innovative distributed algorithm that runs on each compute node in a cluster/grid environment to perform structure comparison of a given subset of input structures using some of the most popular PSC methods [e.g., universal similarity metric (USM), maximum contact map overlap (MaxCMO), fast alignment and search tool (FAST), distance alignment (DaliLite), combinatorial extension (CE), template modeling alignment (TMAlign)]. We follow this with a procedure of distributed consensus building. Thus, the new algorithms proposed here achieve ProCKSI's similarity assessment quality but with a fraction of the time required by it. Our results show that the proposed distributed method can be used efficiently to compare: 1) a particular protein against a very large protein structures dataset (target-against-all comparison), and 2) a particular very large-scale dataset against itself or against another very large-scale dataset (all-against-all comparison). We conclude the paper by enumerating some of the outstanding challenges for real-time MC-PSC.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.