Abstract

ABSTRACTIdentification of genomic biomarkers is an important area of research in the context of drug discovery experiments. These experiments typically consist of several high dimensional datasets that contain information about a set of drugs (compounds) under development. This type of data structure introduces the challenge of multi-source data integration. High-Performance Computing (HPC) has become an important tool for everyday research tasks. In the context of drug discovery, high dimensional multi-source data needs to be analyzed to identify the biological pathways related to the new set of drugs under development. In order to process all information contained in the datasets, HPC techniques are required. Even though R packages for parallel computing are available, they are not optimized for a specific setting and data structure. In this article, we propose a new framework, for data analysis, to use R in a computer cluster. The proposed data analysis workflow is applied to a multi-source high dimensional drug discovery dataset and compared with a few existing R packages for parallel computing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.