Abstract

High density oligonucleotide array (microarray) from Affymetrix has been widely used for the measurements of gene expressions. Currently, public data repositories, such as Gene Expression Omnibus (GEO) of the National Center for Biotechnology Information (NCBI), have accumulated large amounts of microarray data. Efficient integrative analysis of those microarray data will provide significant knowledge about biological systems. None of the existing microarray preprocessing and quality assessment tools can handle very large microarray datasets with tens of thousands of experiments. The preprocessing and quality assessment of microarray datasets contain both data-intensive and compute-intensive tasks. In this paper, we develop a new set of tools using a mix of the Hadoop (for data intensive tasks) and the General-Purpose Graphics Processing Units (GPGPUs) (for compute intensive tasks) to efficiently process large microarray data. Evaluation of our new tools on large microarray datasets with ten thousands of experiments showed promising superior performance. We demonstrate that the combination of Hadoop and GPGPU computation is effective for complex scientific applications that contain both data-intensive and compute-intensive tasks. Our new tool set will make it possible to utilize valuable large microarray data in the public repositories.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.