Abstract

Across different scientific domains, engineering disciplines, and application scenarios, increasingly, users have to deal with large-scale, diverse, feature-rich, and highresolution data sets that allow for data-intensive decision-making. The so-called big data challenge is making a profound transformation in computing. Big data not only refers to data sets that are large in size, but also covers data sets that are complex in structures, high dimensional, distributed, and heterogeneous. An effective frameworkwhen working with big data is through data summaries, such as different sampling methods, histograms, sketches and synopses, low-rank subspace approximation, dimensionality reduction techniques, etc. Instead of operating on complex and large raw data directly, these tools enable the execution of various data analytics tasks through appropriate and carefully constructed summaries, which improve their efficiency and scalability. Though some of these topics have been well studied in the past, the big data phenomena opens doors for interesting new research. These challenges include, but are not limited to, how to quantify the accuracy and efficiency trade-offwhen summarizing big data in massively parallel and distributed environments, how to summarize features in complex heterogeneous data, how to address IO and system issues in a summarization process, how to reduce communication cost when building a summary for a large data set stored in a cluster of commodity machines (such as a key-value store), how to dynamically maintain a summary in an incremental fashion under arbitrary arrivals of new data. As a result, answering the big data challenge through scalable data summarization is becoming of paramount importance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.