Abstract

BackgroundMicroarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication) parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today.ResultsOur results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm.ConclusionsThe performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels.

Highlights

  • Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures

  • This two-dimensional array is read-shared within each Message Passing Interface (MPI) process and its size grows as a product of m and n

  • Technology advances have been on a fast track in recent years, making it possible to conduct microarray experiments much faster and less expensive than in the past

Read more

Summary

Results

Our evaluation is conducted on a cluster of five machines, each with 16 GBytes of memory and two 3.0 GHz quadcore Intel Xeon 5450 processors, for a total of 40 processors. This is a result of the data sharing in the parallel threaded implementation, as described in the Data Sharing section. The 4 threaded strategy shows slightly better performance These variations across different numbers of threads come from differences in load balance at the Pthread and MPI parallelization levels. Once the scheduling is made both deterministic and ensures appropriate cache sharing, the step function is less pronounced in the multi-threaded runs in Figure 5 due to their reduced memory bandwidth demands and smoother load function at the MPI level

Conclusions
Background
22. Patterson D
29. Barney B
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call