Abstract

BackgroundGenome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files.ResultsMost components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors.ConclusionsExecuting genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL.

Highlights

  • Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response

  • GenABEL is a specialized library package for GWA analysis [3] implemented in R, an open source statistics programming language and environment [4,5]

  • We present the development of our ParallABEL library, a new R library for parallelization of GWA studies based on Rmpi

Read more

Summary

Results

Hanuman, runs Rocks Cluster Distribution version 4.3, which includes the SUN Grid. The computing time for the sequential version of Type2_parall_by_individuals can be very short (e.g. 20 seconds). Type4_parall_by_pairs_of_SNPs was executed by the GenABEL r2fast function. Type4_parall_by_pairs_of_SNPs took only 1.4 days to execute on eight processors, indicating that time-saving with ParallABEL is linearly correlated to the number of nodes. This suggests that with more SNPs, more computing time will be saved by ParallABEL. The time-saving rates are slowly increased when the number of processors is greater than 100 This applies to the and relatively small data set analyzed here. The user should optimize the number of processors according to the gain in computational throughput

Background
Discussion and conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.