Abstract
BackgroundLinkage disequilibrium (LD) analysis is broadly utilized in genetics to understand the evolutionary and demographic history and helps geneticists identify genes associated with interested inherited traits, such as diseases. There are some tools for linkage disequilibrium analysis either in a local or online way; however, there has been no such tool supporting both graphical user interface (GUI) and parallel computing.ResultsWe developed a GUI software called LDkit for LD analysis, which supports parallel computing. The LDkit supports both variant call format (VCF) and PLINK ‘ped + map’ format. At the same time, users could also just analyze a subset of individuals from the whole population. The LDkit reads the data by block and then paralleled the computation process by monitoring the usage of processes. Assessment on the Human 1000 genome data showed that when paralleled with 32 threads, the running time was reduced to less than 6 minutes from ~77 minutes using the chromosome 22 dataset with 1,103,547 SNPs and 2504 individuals.ConclusionsThe software LDkit can be effectively used to calculate and plot LD decay, LD block, and linkage disequilibrium analysis between a site and a given region. Most importantly, both graphical user interface (GUI) and stand-alone packages are available for users’ convenience. LDkit was written in JAVA language under cross-platform support.
Highlights
Linkage disequilibrium (LD) analysis is broadly utilized in genetics to understand the evolutionary and demographic history and helps geneticists identify genes associated with interested inherited traits, such as diseases
Most of the population genetic analyses, such as detection of sites under selection, estimation of divergence time between different populations, positional cloning, and genome wide association studies (GWAS) are based on linkage disequilibrium (LD) theory, which has been investigated over 100 Years [1]
LD decay analysis To assess the performance of LDkit on LD decay analysis, we tested the software using the chromosome 22 dataset from the Human 1000 genome
Summary
We developed a GUI software called LDkit for LD analysis, which supports parallel computing. The LDkit supports both variant call format (VCF) and PLINK ‘ped + map’ format. Users could just analyze a subset of individuals from the whole population. The LDkit reads the data by block and paralleled the computation process by monitoring the usage of processes. Assessment on the Human 1000 genome data showed that when paralleled with 32 threads, the running time was reduced to less than 6 minutes from ~77 minutes using the chromosome 22 dataset with 1,103,547 SNPs and 2504 individuals
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.