Abstract
MotivationPairwise comparison problems arise in many areas of science. In genomics, datasets are already large and getting larger, and so operations that require pairwise comparisons—either on pairs of SNPs or pairs of individuals—are extremely computationally challenging. We propose a generic algorithm for addressing pairwise comparison problems that breaks a large problem (of order n2 comparisons) into multiple smaller ones (each of order n comparisons), allowing for massive parallelization.ResultsWe demonstrated that this approach is very efficient for calling identical by descent (IBD) segments between all pairs of individuals in the UK Biobank dataset, with a 250-fold savings in time and 750-fold savings in memory over the standard approach to detecting such segments across the full dataset. This efficiency should extend to other methods of IBD calling and, more generally, to other pairwise comparison tasks in genomics or other areas of science.Availability and ImplementationA GitHub page is available at https://github.com/emmanuelsapin with the code to generate data needed for the implementation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.