Abstract

The DDM Tracer monitoring framework is aimed to trace and monitor the ATLAS file operations on the Worldwide LHC Computing Grid. The volume of traces has increased significantly since the framework was put in production in 2009. Now there are about 5 million trace messages every day and peaks can be near 250Hz, with peak rates continuing to climb, which gives the current structure a big challenge. Analysis of large datasets based on on-demand queries to the relational database management system (RDBMS), i.e. Oracle, can be problematic, and have a significant effect on the database's performance. Consequently, We have investigated some new high availability technologies like messaging infrastructure, specifically ActiveMQ, and key-value stores. The advantages of key value store technology are that they are distributed and have high scalability; also their write performances are usually much better than RDBMS, all of which are very useful for the Tracer monitoring framework. Indexes and distributed counters have been also tested to improve query performance and provided almost real time results. In this paper, the design principles, architecture and main characteristics of Tracer monitoring framework will be described and examples of its usage will be presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call