Abstract

High-performance computing (HPC) systems generate huge amounts of metadata about different entities such as jobs, users, and files. Existing systems can efficiently record and manage part of these metadata, mainly the POSIX metadata of data files (e.g., file size, name, and permissions mode). But another important set of metadata, referred to as “rich” metadata in this study, which record not only wider range of entities (e.g., running processes and jobs) but also more complex relationships between them, are mostly missing in current HPC systems. Yet such rich metadata are critical for supporting many advanced data management functions such as identifying data sources and parameters behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. To uniformly and efficiently manage the rich metadata generated in HPC systems, We propose to utilize a graph model in this study. We identify the key challenges of implementing such a graph-based HPC rich metadata management system and present GraphMeta, a graph-based rich metadata management system designed and optimized for HPC platforms, to tackle these challenges. Extensive evaluations on both synthetic and real HPC metadata workloads show its advantages in both performance and scalability compared with existing solutions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.