Abstract

Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression.

Highlights

  • Recent advancement in computing and networking technologies has witnessed the rising and flourishing of data-intensive applications that severely challenge the existing data management and computing systems

  • The query processing algorithms were programmed in following three ways: (1) most one-body functions are implemented as stored procedures using PL/pgSQL – a built-in procedural language provided by PostgreSQL; (2) the more complex multi-body functions are programmed as C functions and integrated into the PostgreSQL query interface; and (3) query processing algorithms related to the Time-Parameterized Spatial (TPS) tree are directly implemented as part of the PostgreSQL kernel

  • We developed a unified Information Integration and Informatics framework named Database-centric Molecular Simulations (DCMS)

Read more

Summary

Introduction

Recent advancement in computing and networking technologies has witnessed the rising and flourishing of data-intensive applications that severely challenge the existing data management and computing systems. Data-intensive applications commonly require significant storage space and intensive computing power. The demand of such resources alone, is not the only fundamental challenge of dealing with big data [1,2,3]. Data storage and management systems should support high throughput data access with millions of tweets generated each second [4]. In addition to the low level data cleansing, access, and management issues, user privacy and public policies should be considered (and integrated) in the analytical process for meaningful outcomes

Objectives
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.