Solving the Scalability Dilemma with Clouds, Crowds, and Algorithms

Michael J Franklin

doi:10.1007/978-3-642-13067-0_2

Abstract

AbstractThe creation, analysis, and dissemination of data have become profoundly democratized. Social networks spanning 100/per day of millions of users enable instantaneous discussion, debate, and information sharing. Streams of tweets, blogs, photos, and videos identify breaking events faster and in more detail than ever before. Deep, on-line datasets enable analysis of previously unreachable information. This sea change is the result of a confluence of Information Technology advances such as: intensively networked systems, cloud computing, social computing, and pervasive devices and communication. The key challenge is that the massive scale and diversity of this continuous flood of information breaks our existing technologies. State-of-the-art Machine Learning algorithms do not scale to massive data sets. Existing data analytics frameworks cope poorly with incomplete and dirty data and cannot process heterogeneous multi-format information. Current large-scale processing architectures struggle with diversity of programming models and job types and do not support the rapid marshalling and unmarshalling of resources to solve specific problems. All of these limitations lead to a Scalability Dilemma: beyond a point, our current systems tend to perform worse as they are given more data, more processing resources, and involve more people exactly the opposite of what should happen.The Berkeley RADLab is a collaborative effort focused on cloud computing, involving nearly a dozen faculty members and postdocs, several dozen students and fifteen industrial sponsors. The lab is in the final year of a five-year effort to develop the software infrastructure to enable rapid deployment of robust, scalable, data-intensive internet services. In this talk I will give an overview of the RADLab effort and do a deeper dive on several projects, including: PIQL, a performance insightful query language for interactive applications, and SCADS, a self-managing, scalable key value store. I will also give an overview of a new effort we are starting on next generation cloud computing architectures (called the ”AMPLab” - for Algorithms, Machines, and People) focused on large-scale data analytics, machine learning, and hybrid cloud/crowd computing. In a nutshell, the RADLab approach has been to use Statistical Machine Learning in the service of building large-scale systems. The AMPLab is exploring the other side of this relationship, namely, using large-scale systems to support Statistical Machine Learning and other analysis techniques for data-intensive applications. And given the central role of the cloud in a world of pervasive connectivity, a key part of the research agenda is to support collaborative efforts of huge populations of users connected through cloud resources.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Solving the Scalability Dilemma with Clouds, Crowds, and Algorithms

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Investigating statistical machine learning as a tool for software development
Kayur Patel ... James A Landay
-
Kayur Patel, et. al.Kayur Patel ... James A Landay
06 Apr 2008
06 Apr 2008

A Bayesian perspective of statistical machine learning for big data
Rajiv Sambasivan ... Sourish Das
Computational Statistics | VOL. 35
Rajiv Sambasivan, et. al.Rajiv Sambasivan ... Sourish Das
01 Apr 2020
Computational Statistics | VOL. 35

Current and future applications of statistical machine learning algorithms for agricultural machine vision systems
Tanzeel U Rehman ... Jaemyung Shin
Computers and Electronics in Agriculture | VOL. 156
Tanzeel U Rehman, et. al.Tanzeel U Rehman ... Jaemyung Shin
13 Dec 2018
Computers and Electronics in Agriculture | VOL. 156

Data Management in Machine Learning
Arun Kumar ... Jun Yang
-
Arun Kumar, et. al.Arun Kumar ... Jun Yang
09 May 2017
09 May 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Solving the Scalability Dilemma with Clouds, Crowds, and Algorithms

Abstract

Talk to us

Similar Papers