Desktop Machines Research Articles

Data protection, like almost everything else in our lives, is challenged by the advent of ‘big data’. The Economist reports in its 2012 Outlook that the quantity of global digital data expanded from 130 exabytes in 2005 to 1,227 in 2010, and is predicted to rise to 7,910 exabytes in 2015. An exabyte is a quintillion bytes. If you find that hard to visualize, consider this: someone has calculated that if you loaded an exabyte of data on to DVDs in slimline jewel cases, and then loaded them into Boeing 747 aircraft, it would take 13,513 planes to transport one exabyte of data. Using DVDs to move the data collected globally in 2010 would require a fleet of more than 16 million jumbo jets. And exabytes are rapidly becoming passe. The volume of stored information in the world is growing so fast that scientists have had to create new terms, including zettabyte and yottabyte, to describe the flood of data. The importance of big data is not just a result of its size or how fast it is growing (about 60 per cent a year), but also the reality that the data come from an amazing array of sources. The Internet captures lots of data. Facebook alone has more than 800 million active users, more than half of whom log in every day, where they generate more than 900 million web pages and upload more than 250 million photos every day. In 2010, a lifetime ago in Internet time, Google sites were used by more than 1 billion unique visitors every month who spent a collective 200 billion minutes on its sites. Google-owned YouTube passed 1 trillion video playbacks in 2011. Email, IM, VOIP calls, and other communications generate tens of trillions of recorded messages every year. Credit and debit cards, checks, and other financial activities provide a steady stream of billions of financial transactions recorded every month. And increasingly sensor networks—video surveillance cameras, embedded computers in automobiles, the more than 5 billion cell phones we carry—record locations, movements, and activities. We can now talk meaningfully about ubiquitous data collection, in which almost everything we do results in data being captured and stored by one or more third parties. It is significant that those data are digital. They can be stored, shared, searched, combined, and duplicated with extraordinary speed and at very little cost. And they are accompanied by metadata—data about when and where and how the underlying information was generated. Some experts estimate that there may be five times more metadata than the information we are aware of creating, and this metadata can be extraordinarily revealing. We used to define ‘big data’ as being data sets so large that a supercomputer was needed to process them, but another aspect of big data has been that not only has analytical capacity soared, but also become far more inexpensive and widely distributed. It is not just that today’s mobile devices have more computing power than the desktop machines of a decade ago, but also that we can now link data and computers virtually so that huge computational tasks can be undertaken affordably and conveniently. In fact, we are witnessing the movement of more of that computational power, as well as storage of the tidal wave of data we are generating and collecting, into the ‘cloud’. Cloud computing is all the rage, but despite the overuse and misuse of the term, it is increasingly clear that many of the data and resources we used to believe that we had to possess locally—in computers, handheld devices, entertainment systems, and business record systems—can now be provided with greater security and reliability (and at lower cost) remotely. When thinking about the importance of ‘big data’, it is critical to remember that access to so much data, from so many different sources, and to the computing

BackgroundHigh-throughput sequencing is generating massive amounts of data at a pace that largely exceeds the throughput of data analysis routines. Here we introduce Fish the ChIPs (FC), a computational pipeline aimed at a broad public of users and designed to perform complete ChIP-Seq data analysis of an unlimited number of samples, thus increasing throughput, reproducibility and saving time.ResultsStarting from short read sequences, FC performs the following steps: 1) quality controls, 2) alignment to a reference genome, 3) peak calling, 4) genomic annotation, 5) generation of raw signal tracks for visualization on the UCSC and IGV genome browsers. FC exploits some of the fastest and most effective tools today available. Installation on a Mac platform requires very basic computational skills while configuration and usage are supported by a user-friendly graphic user interface. Alternatively, FC can be compiled from the source code on any Unix machine and then run with the possibility of customizing each single parameter through a simple configuration text file that can be generated using a dedicated user-friendly web-form. Considering the execution time, FC can be run on a desktop machine, even though the use of a computer cluster is recommended for analyses of large batches of data. FC is perfectly suited to work with data coming from Illumina Solexa Genome Analyzers or ABI SOLiD and its usage can potentially be extended to any sequencing platform.ConclusionsCompared to existing tools, FC has two main advantages that make it suitable for a broad range of users. First of all, it can be installed and run by wet biologists on a Mac machine. Besides it can handle an unlimited number of samples, being convenient for large analyses. In this context, computational biologists can increase reproducibility of their ChIP-Seq data analyses while saving time for downstream analyses.ReviewersThis article was reviewed by Gavin Huttley, George Shpakovski and Sarah Teichmann.

Desktop Machines Research Articles

Related Topics

Articles published on Desktop Machines

GTfold: Enabling parallel RNA secondary structure prediction on multi-core desktops

Investigation of Rotational-Axis and Linear-Axis Operation Based on Power Consumption with Desktop Five-Axis Controlled Machine Tools

The challenge of 'big data' for data protection

PAIR: polymorphic Alu insertion recognition

An efficient time domain solver for the acoustic wave equation

SENSOR WEB INTERACTION

An experimental study on micro-grinding process with nanofluid minimum quantity lubrication (MQL)

LOSCAR: Long-term Ocean-atmosphere-Sediment CArbon cycle Reservoir Model v2.0.4

REBOUND: an open-source multi-purposeN-body code for collisional dynamics

The Use of a Desktop Scanning Electron Microscope as a Diagnostic Tool in Studying Fibrin Networks of Thrombo-embolic Ischemic Stroke

Acceleration of Two Point Correlation Function Calculation for Pathology Image Segmentation.

Optimization of a parallel permutation testing function for the SPRINT R package

CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes

Applying semantic segment detection to enhance Web page presentation on the mobile Internet

Contouring controller design based on iterative contour error estimation for three-dimensional machining

Numerical Polynomial Homotopy Continuation Method and String Vacua

Fish the ChIPs: a pipeline for automated genomic annotation of ChIP-Seq data

On-machine Measuring and Machining of Freeform Surface Using a Polar-coordinate Desktop Machine Tool

More dependent types for distributed arrays

Decentralized approach to resource availability prediction using group availability in a P2P desktop grid

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Desktop Machines Research Articles

Related Topics

Articles published on Desktop Machines

GTfold: Enabling parallel RNA secondary structure prediction on multi-core desktops

Investigation of Rotational-Axis and Linear-Axis Operation Based on Power Consumption with Desktop Five-Axis Controlled Machine Tools

The challenge of 'big data' for data protection

PAIR: polymorphic Alu insertion recognition

An efficient time domain solver for the acoustic wave equation

SENSOR WEB INTERACTION

An experimental study on micro-grinding process with nanofluid minimum quantity lubrication (MQL)

LOSCAR: Long-term Ocean-atmosphere-Sediment CArbon cycle Reservoir Model v2.0.4

REBOUND: an open-source multi-purposeN-body code for collisional dynamics

The Use of a Desktop Scanning Electron Microscope as a Diagnostic Tool in Studying Fibrin Networks of Thrombo-embolic Ischemic Stroke

Acceleration of Two Point Correlation Function Calculation for Pathology Image Segmentation.

Optimization of a parallel permutation testing function for the SPRINT R package

CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes

Applying semantic segment detection to enhance Web page presentation on the mobile Internet

Contouring controller design based on iterative contour error estimation for three-dimensional machining

Numerical Polynomial Homotopy Continuation Method and String Vacua

Fish the ChIPs: a pipeline for automated genomic annotation of ChIP-Seq data

On-machine Measuring and Machining of Freeform Surface Using a Polar-coordinate Desktop Machine Tool

More dependent types for distributed arrays

Decentralized approach to resource availability prediction using group availability in a P2P desktop grid