Abstract

Mechanisms for remote execution of computational tasks enable a distributed system to effectively utilize all available resources. This ability is essential to attaining the objectives of high availability, system reliability, and graceful degradation and directly contribute to flexibility, adaptability, and incremental growth. As part of a national fabric of Distributed High Throughput Computing (dHTC) services, remote execution is a cornerstone of the Open Science Grid (OSG) Compute Federation. Most of the organizations that harness the computing capacity provided by the OSG also deploy HTCondor pools on resources acquired from the OSG. The HTCondor Compute Entrypoint (CE) facilitates the remote acquisition of resources by all organizations. The HTCondor-CE is the product of a most recent translational cycle that is part of a multidecade translational process. The process is rooted in a partnership, between members of the High Energy Physics community and computer scientists, that evolved over three decades and involved testing and evaluation with active users and production infrastructures. Through several translational cycles that involved researchers from different organizations and continents, principles, ideas, frameworks and technologies were translated into a widely adopted software artifact that isresponsible for provisioning of approximately 9 million core hours per day across 170 endpoints.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call