Abstract

This workshop is for those of you who, having read about Big Data and seen some of its results in academic studies and the commercial world, would like to get a sense of what actually working with Big Data entails.The workshop will provide an overview of key technologies for the handling and analysis of large scale datasets, including Hadoop/MapReduce, the RHadoop package, other R packages used for large scale analysis, and Big Data handling environments such as Cloudera, Hortonworks, Tessera, and Amazon Web Services. We will also discuss a few of the primary challenges in successfully completing analysis of large scale data, such as integrating and structuring heterogenous data, handling sparse matrices, and devising effective analytical routines using parallel processing and splitting data. Participants will work with a live demonstration environment that provides a realistic introduction to Big Data Analytics using scripts that will run both on a scaled-down demonstration dataset and on truly large scale data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call