Abstract
The statistical analysis of infrastructure metrics comes with several specific challenges, including the fairly large volume of unstructured metrics from a large set of independent data sources. Hadoop and Spark provide an ideal environment in particular for the first steps of skimming rapidly through hundreds of TB of low relevance data to find and extract the much smaller data volume that is relevant for statistical analysis and modelling. This presentation will describe the new Hadoop service at CERN and the use of several of its components for high throughput data aggregation and ad-hoc pattern searches. We will describe the hardware setup used, the service structure with a small set of decoupled clusters and the first experience with co-hosting different applications and performing software upgrades. We will further detail the common infrastructure used for data extraction and preparation from continuous monitoring and database input sources.
Highlights
The quantitative analysis of computing infrastructure metrics is recently receiving increasing attention at CERN and other High Energy Physics sites
The IT departement has setup a working group with the goal to understand the science workflows involving the CERN computing center on a more quantitative level. The scope of this working group includes the medium to long term metric analysis using statistical and machine learning methods aiming to go beyond isolated time series analysis performed in traditional monitoring systems
Some of the development reported in this contribution have been performed in collaboration with infrastructure experts from the Bhabha Research Centre (BARC, Mumbai)
Summary
- Integration of Oracle and Hadoop: Hybrid Databases Affordable at Scale L Canali, Z Baranowski and P Kothuri. - Developing and Optimizing Applications in Hadoop P Kothuri, D Garcia and J Hermans. - Scale out databases for CERN use cases Zbigniew Baranowski, Maciej Grzybek, Luca Canali et al. View the article online for updates and enhancements.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.