Abstract

WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues including connection failures, congestion and traffic routing. The OSG Networking Area, in partnership with WLCG, is focused on being the primary source of networking information for its partners and constituents. It was established to ensure sites and experiments can better understand and fix networking issues, while providing an analytics platform that aggregates network monitoring data with higher level workload and data transfer services. This has been facilitated by the global network of the perfSONAR instances that have been commissioned and are operated in collaboration with WLCG Network Throughput Working Group. An additional important update is the inclusion of the newly funded NSF project SAND (Service Analytics and Network Diagnosis) which is focusing on network analytics. This paper describes the current state of the network measurement and analytics platform and summarises the activities taken by the working group and our collaborators. This includes the progress being made in providing higher level analytics, alerting and alarming from the rich set of network metrics we are gathering.

Highlights

  • The Open Science Grid (OSG) and the Wordwide LHC Computing Grid (WLCG) have been supporting network monitoring activities since 2012, focusing on assisting their users and affiliates on improving their overall network throughput by introducing active monitoring of their networks and providing the ability to test for and identify potential network performance bottlenecks [1, 2]

  • WLCG Network Throughput Working Group was established in 2014 to help with some of the underlying tasks, such as overseeing the global network of measurement agents based on perfSONAR[4], establishing baseline measurements and performing low-level debugging activities

  • This has lead to a dedicated network throughput support unit, which has proven to successfully coordinate and resolve complex network performance incidents within LHCOPN and LHCONE[5]

Read more

Summary

Introduction

The Open Science Grid (OSG) and the Wordwide LHC Computing Grid (WLCG) have been supporting network monitoring activities since 2012, focusing on assisting their users and affiliates on improving their overall network throughput by introducing active monitoring of their networks and providing the ability to test for and identify potential network performance bottlenecks [1, 2]. Two important areas of development that were undertaken were establishing and operating a global network of measurements agents and development and operations of a comprehensive networking monitoring platform, which collects and stores the measurements while making them available for further processing. WLCG Network Throughput Working Group was established in 2014 to help with some of the underlying tasks, such as overseeing the global network of measurement agents based on perfSONAR[4], establishing baseline measurements and performing low-level debugging activities. This has lead to a dedicated network throughput support unit, which has proven to successfully coordinate and resolve complex network performance incidents within LHCOPN and LHCONE[5]

Network Performance
Platform Use
Network Analytics
Evolution and Future
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call