Abstract

DNA@Home is a volunteer computing project that aims to use Gibbs Sampling for the identification and location of DNA control signals on full genome-scale datasets. A fault tolerant and asynchronous implementation of Gibbs sampling using the Berkeley Open Infrastructure for Network Computing (BOINC) was used to identify the location of binding sites of the SNAI1 (Snail) and SNAI2 (Slug) transcription factors across the human genome. Genes regulated by Slug but not Snail, and genes regulated by Snail but not Slug provided two datasets with known motifs. These datasets contained up to 994 DNA sequences which to our knowledge is largest scale use of Gibbs sampling for discovery of binding sites. 1000 parallel sampling walks were used to search for the presence of 1, 2 or 3 possible motifs using small, medium, and full size sets of these sequences. These runs were performed over a period of two months using over 1500 volunteered computing hosts and generated over 2.2 Terabytes of sampling data. High performance computing resources were used for post processing. This paper presents intra and inter walk analyses used to determine walk convergence. The results were validated against current biological knowledge of the Snail and Slug promoter regions and present avenues for further biological study.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call