A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment

M Titov,K De,A Klimentov,S Jha,G Záruba

doi:10.1088/1742-6596/1085/4/042028

Abstract

Scientific computing has advanced in the ways it deals with massive amounts of data, since the production capacities have increased significantly for the last decades. Most large science experiments require vast computing and data storage resources in order to provide results or predictions based on the data obtained. For scientific distributed computing systems with hundreds of petabytes of data and thousands of users it is important to keep track not just of how data is distributed in the system, but also of individual users’ interests in the distributed data (reveal implicit interconnection between user and data objects). This however requires the collection and use of specific statistics such as correlations between data distribution, the mechanics of data distribution, and mainly user preferences. This work focuses on user activities (specifically, data usages) and interests in such a distributed computing system, namely PanDA (Production ANd Distributed Analysis system). PanDA is a high-performance workload management system originally designed to meet production and analysis requirements for a data-driven workload at the Large Hadron Collider Computing Grid for the ATLAS Experiment hosted at CERN (the European Organization for Nuclear Research). In this work we are going to investigate whether data collection that was gathered in the past in PanDA shows any trends indicating that users could have mutual interests that would be kept for the next data usages (i.e., data usage patterns), using data mining techniques such as association analysis, sequential pattern mining, and basics of the recommender system approach. We will show that such common interests between users indeed exist and thus could be used to provide recommendations (in terms of the collaborative filtering) to help users with their data selection process.

Highlights

Recommender SystemA recommender system uses a set of machine learning/data mining processes, that aim to guide users in a personalized way to interesting or useful items in a large space of possible options [3]
Production and Distributed Analysis system PanDA [1] is a high-performance pilot-based workload management system. This means that workload is assigned based on the feedback from successfully activated and validated pilot jobs, which are lightweight processes that probe the environment and act as “smart wrappers” for the payload
In PanDA, an independent subsystem manages the delivery of pilot jobs to all worker nodes via a number of well-known cluster and grid scheduling systems (e.g., Condor-G)

Summary

Recommender System

A recommender system uses a set of machine learning/data mining processes, that aim to guide users in a personalized way to interesting or useful items in a large space of possible options [3]. Information Retrieval Information Filtering Recommender System assists users to locate data filters out irrelevant items from a user’s information highlight valuable items in a user’s information stream

Sequential Pattern Mining

Recommendation Simulation

User Activities

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Journal: Journal of Physics: Conference Series	Publication Date: Sep 1, 2018
License type: cc-by

Similar Papers

Role of the ATLAS Grid Information System (AGIS) in Distributed Data Analysis and Simulation
A V Anisenkov
Optoelectronics, Instrumentation and Data Processing | VOL. 54
A V AnisenkovA V Anisenkov
01 Mar 2018
Optoelectronics, Instrumentation and Data Processing | VOL. 54

Experiment Dashboard for Monitoring Computing Activities of the LHC Virtual Organizations
Julia Andreeva ... Irina Sidorova
Journal of Grid Computing | VOL. 8
Julia Andreeva, et. al.Julia Andreeva ... Irina Sidorova
28 Apr 2010
Journal of Grid Computing | VOL. 8

Performance analysis of a file catalog for the LHC computing grid
J.-P Baud ... S Lemaitre
-
J.-P Baud, et. al.J.-P Baud ... S Lemaitre
24 Jul 2005
24 Jul 2005

CRIC: a unified information system for WLCG and beyond
Alexey Anisenkov ... A Forti
EPJ Web of Conferences | VOL. 214
Alexey Anisenkov, et. al.Alexey Anisenkov ... A Forti
01 Jan 2019
EPJ Web of Conferences | VOL. 214

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series