Abstract
AbstractThe creation, analysis, and dissemination of data have become profoundly democratized. Social networks spanning 100/per day of millions of users enable instantaneous discussion, debate, and information sharing. Streams of tweets, blogs, photos, and videos identify breaking events faster and in more detail than ever before. Deep, on-line datasets enable analysis of previously unreachable information. This sea change is the result of a confluence of Information Technology advances such as: intensively networked systems, cloud computing, social computing, and pervasive devices and communication. The key challenge is that the massive scale and diversity of this continuous flood of information breaks our existing technologies. State-of-the-art Machine Learning algorithms do not scale to massive data sets. Existing data analytics frameworks cope poorly with incomplete and dirty data and cannot process heterogeneous multi-format information. Current large-scale processing architectures struggle with diversity of programming models and job types and do not support the rapid marshalling and unmarshalling of resources to solve specific problems. All of these limitations lead to a Scalability Dilemma: beyond a point, our current systems tend to perform worse as they are given more data, more processing resources, and involve more people exactly the opposite of what should happen.The Berkeley RADLab is a collaborative effort focused on cloud computing, involving nearly a dozen faculty members and postdocs, several dozen students and fifteen industrial sponsors. The lab is in the final year of a five-year effort to develop the software infrastructure to enable rapid deployment of robust, scalable, data-intensive internet services. In this talk I will give an overview of the RADLab effort and do a deeper dive on several projects, including: PIQL, a performance insightful query language for interactive applications, and SCADS, a self-managing, scalable key value store. I will also give an overview of a new effort we are starting on next generation cloud computing architectures (called the ”AMPLab” - for Algorithms, Machines, and People) focused on large-scale data analytics, machine learning, and hybrid cloud/crowd computing. In a nutshell, the RADLab approach has been to use Statistical Machine Learning in the service of building large-scale systems. The AMPLab is exploring the other side of this relationship, namely, using large-scale systems to support Statistical Machine Learning and other analysis techniques for data-intensive applications. And given the central role of the cloud in a world of pervasive connectivity, a key part of the research agenda is to support collaborative efforts of huge populations of users connected through cloud resources.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.