Abstract

As analysts are expected to process a greater amount of information in a shorter amount of time, creators of big data software are challenged with the need for improved efficiency. Ray, our group's usable, scalable genome assembler, addresses big data problems by using optimal resources and producing one, correct and conservative, timely solution. Only by abstracting the size of the data from both the computers and the humans can the real scientific question, often complex in itself, eventually be solved. To draw a curtain over the specific computational machinery of big data, we developed RayPlatform, a programming framework that allows users to concentrate on their domain-specific problems. RayPlatform is a parallel message-passing software framework that runs on clouds, supercomputers, and desktops alike. Using established technologies such as C++ and MPI (message-passing interface), we handle the genomes of hundreds of species, from viruses to plants, using machines ranging from desktop computers to supercomputers. From this experience, we present insights on making computer time more useful-and user time much more valuable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call