Integration of Cassandra and Spark in Computer Aided Drug Design

Nitha V R

doi:10.32628/cseit217112

Abstract

The primary purpose of this paper is to provide feasibility study of Cassandra and spark in Computer Aided Drug Design (CADD). The Apache Cassandra database is a big data management tool which can be used to store huge amount of data in different file formats. A huge database can be designed with details of all known molecules or compounds that are existing on earth. The information regarding the compounds such as selectivity, solubility, synthetic viability, affinity, adverse reactions, metabolism and environmental toxicity along with the 3 D structure of molecule can be stored in this big database. A data analytics tool “spark” can be efficiently used in mining and managing huge data stored in the database. Integrating big data in CADD helps in identifying the candidate drugs within minutes, not years. It may take eight to fifteen years to develop a new drug traditionally. Spark is written in Scala Programming Language which runs on Java Virtual Machine (JVM) and it supports Scala, Java and Python Programming languages .Cassandra can provide connectors to different programming languages, hence it’s very easy to integrate any other molecular modeling tool with Spark. A python based molecular modeling tool called Pymol can be easily implemented with Spark. CADD helps in identifying new drugs by computational means thus eliminating unnecessary cost incurred in chemical testing of drugs.

Full Text