Abstract

Genome wide association studies (GWAS) rely on microarrays, or more recently mapping of sequencing reads, to genotype individuals. The reliance on prior sequencing of a reference genome limits the scope of association studies, and also precludes mapping associations outside of the reference. We present an alignment free method for association studies of categorical phenotypes based on counting [Formula: see text]-mers in whole-genome sequencing reads, testing for associations directly between [Formula: see text]-mers and the trait of interest, and local assembly of the statistically significant [Formula: see text]-mers to identify sequence differences. An analysis of the 1000 genomes data show that sequences identified by our method largely agree with results obtained using the standard approach. However, unlike standard GWAS, our method identifies associations with structural variations and sites not present in the reference genome. We also demonstrate that population stratification can be inferred from [Formula: see text]-mers. Finally, application to an E.coli dataset on ampicillin resistance validates the approach.

Highlights

  • Association mapping refers to the linking of genotypes to phenotypes

  • In a pairwise comparison of the Toscani in Italia (TSI) and the Yoruba in Ibadan, Nigeria (YRI) populations we find that sequences identified by our method largely agree with results obtained using standard genome-wide association study (GWAS) based on variant calling from mapped reads (Figure 2)

  • We found that few of the sequences enriched for in TSI samples, with lengths up to 12kbp and 2kbp in comparisons with YRI and Bengali from Bangladesh (BEB) respectively, mapped to the Epstein–Barr virus (EBV) genome, strain B95-8 [GenBank: V01555.2]

Read more

Summary

Introduction

Association mapping refers to the linking of genotypes to phenotypes Most often this is done using a genome-wide association study (GWAS) with single nucleotide polymorphisms (SNPs). In recent years thousands of genome-wide association studies have been performed and regions associated with traits and diseases have been located. This approach has a number of limitations. Even the human reference genome was shown to be incomplete (Altemose et al, 2014) and association mapping to regions not in the reference is difficult Structural variations such as insertion-deletions (indels) and copy number variations are usually ignored in these studies.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.