Abstract

Bioinformatics tools have gained popularity in biology but little is known about their validity. We aimed to assess the early contribution of 415 single nucleotide polymorphisms (SNPs) associated with eight cardio-metabolic traits at the genome-wide significance level in adults in the Family Atherosclerosis Monitoring In earLY Life (FAMILY) birth cohort. We used the popular web-based tool SNAP to assess the availability of the 415 SNPs in the Illumina Cardio-Metabochip genotyped in the FAMILY study participants. We then compared the SNAP output with the Cardio-Metabochip file provided by Illumina using chromosome and chromosomal positions of SNPs from NCBI Human Genome Browser (Genome Reference Consortium Human Build 37). With the HapMap 3 release 2 reference, 201 out of 415 SNPs were reported as missing in the Cardio-Metabochip by the SNAP output. However, the Cardio-Metabochip file revealed that 152 of these 201 SNPs were in fact present in the Cardio-Metabochip array (false negative rate of 36.6%). With the more recent 1000 Genomes Project release, we found a false-negative rate of 17.6% by comparing the outputs of SNAP and the Illumina product file. We did not find any ‘false positive’ SNPs (SNPs specified as available in the Cardio-Metabochip by SNAP, but not by the Cardio-Metabochip Illumina file). The Cohen’s Kappa coefficient, which calculates the percentage of agreement between both methods, indicated that the validity of SNAP was fair to moderate depending on the reference used (the HapMap 3 or 1000 Genomes). In conclusion, we demonstrate that the SNAP outputs for the Cardio-Metabochip are invalid. This study illustrates the importance of systematically assessing the validity of bioinformatics tools in an independent manner. We propose a series of guidelines to improve practices in the fast-moving field of bioinformatics software implementation.

Highlights

  • In the last fifteen years an explosion of online accessible bioinformatics tools have occurred in genomics [1]

  • Using the HapMap 3 reference, we found that 9 single nucleotide polymorphisms (SNPs) out of the 37 SNPs associated with blood pressure traits were not found using SNAP but were available in the Cardio-Metabochip according to the product file (24.3% of false-negatives)

  • Regarding the 276 lipid-associated SNPs, 91 out them were not captured by SNAP but were present on the Cardio-Metabochip according to the Illumina product file (FNR = 33%)

Read more

Summary

Introduction

In the last fifteen years an explosion of online accessible bioinformatics tools have occurred in genomics [1]. SNAP, a post-GWAS web-based tool, has been developed to find single nucleotide polymorphisms (SNPs) or their proxies and retrieve their annotations in various commercially available genotyping arrays [4]. Additional applications of SNAP include calculating linkage disequilibrium (LD) between SNPs, generating graphical plots of regional associations or LD using data from the HapMap or the 1000 Genomes Project [4]. Since its publication in 2008, SNAP has gained popularity and has been cited 402 times in total with 101 of them being cited in 2014 (according to Web of Science, S2 Fig.). Despite this growing popularity, little is known about the validity of SNAP outcomes

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call