Abstract

The availability of cheap, easy-to-use sync-and-share cloud services has split the scientific storage world into the traditional big data management systems and the very attractive sync-and-share services. With the former, the location of data is well understood while the latter is mostly operated in the Cloud, resulting in a rather complex legal situation.Beside legal issues, those two worlds have little overlap in user authentication and access protocols. While traditional storage technologies, popular in HEP, are based on X.509, cloud services and sync-and-share software technologies are generally based on username/password authentication or mechanisms like SAML or Open ID Connect. Similarly, data access models offered by both are somewhat different, with sync-and-share services often using proprietary protocols.As both approaches are very attractive, dCache.org developed a hybrid system, providing the best of both worlds. To avoid reinventing the wheel, dCache.org decided to embed another Open Source project: OwnCloud. This offers the required modern access capabilities but does not support the managed data functionality needed for large capacity data storage.With this hybrid system, scientists can share files and synchronize their data with laptops or mobile devices as easy as with any other cloud storage service. On top of this, the same data can be accessed via established mechanisms, like GridFTP to serve the Globus Transfer Service or the WLCG FTS3 tool, or the data can be made available to worker nodes or HPC applications via a mounted filesystem. As dCache provides a flexible authentication module, the same user can access its storage via different authentication mechanisms; e.g., X.509 and SAML. Additionally, users can specify the desired quality of service or trigger media transitions as necessary, thus tuning data access latency to the planned access profile. Such features are a natural consequence of using dCache.We will describe the design of the hybrid dCache/OwnCloud system, report on several months of operations experience running it at DESY, and elucidate the future road-map.

Highlights

  • Change data between syncing and non­syncing storage, like Amazon, provide different QoS with different costs, share data without syncing, 3rd party transfers between sites, direct access to sync space from compute facilities

  • Integration with scientific data life­cycle; “Hot” data can be stored on SSDs, “cold” on cheaper HDDs, “archive” tape; ... but no sync and share facilities

  • ● Combining these two gives DESY the best of both worlds: dCache is mounted on servers with NFS v4.1/pNFS, running community edition ownCloud

Read more

Summary

How we solved it at DESY

● Looked around, chose two open­source projects:. Integration with scientific data life­cycle; “Hot” data can be stored on SSDs, “cold” on cheaper HDDs, “archive” tape; ... Our collaborators adopting ownCloud makes it more attractive; ... ● Combining these two gives DESY the best of both worlds: dCache is mounted on servers with NFS v4.1/pNFS, running community edition ownCloud. Integrated with DESY Kerberos, LDAP and “Registry”

The DESY Cloud service
Development and future work
Backup slides
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call