Abstract

BackgroundComparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server.Methodology/Principal FindingsWe introduce Phylo, a human-based computing framework applying “crowd sourcing” techniques to solve the Multiple Sequence Alignment (MSA) problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered.Conclusions/SignificanceWe demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of “human-brain peta-flops” of computation that are spent every day playing games. Phylo is available at: http://phylo.cs.mcgill.ca.

Highlights

  • The problem of optimally aligning a set of biological sequences (multiple sequence alignment (MSA)) is one of the most fundamental question in computational biology, with the first problem formulations and accompanying algorithms dating back to the early 1970’s [1]

  • Multiple alignments are at the core of most comparative genomics studies, as they allow to study how genetic sequences evolve and infer the function of different regions based on their evolutionary patterns [2,3], including protein-coding regions [4] and RNA genes [5], as well as regulatory regions [6,7,8]

  • The sum-of-pairs score has been heavily used in early studies, more phylogenetically-aware scoring schemes are preferred [14,15,16,17]

Read more

Summary

Introduction

The problem of optimally aligning a set of biological sequences (multiple sequence alignment (MSA)) is one of the most fundamental question in computational biology, with the first problem formulations and accompanying algorithms dating back to the early 1970’s [1]. Multiple alignments are at the core of most comparative genomics studies, as they allow to study how genetic sequences evolve and infer the function of different regions based on their evolutionary patterns [2,3], including protein-coding regions [4] and RNA genes [5], as well as regulatory regions [6,7,8]. They play a central role in the identification of genomic regions under purifying [9] or diversifying selection [10,11]. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.