Abstract Purpose: The National Cancer Institute’s (NCI’s) Cancer Research Data Commons (CRDC) is a data ecosystem for the cancer research community that provides cloud-based, secure storage and analytic tools for many cancer data types, including genomic, proteomic, imaging, and clinical data. CRDC requires a carefully designed, federated data governance framework to enable CRDC’s goals to sustain a collaborative data environment where researchers can access timely, accurate, and relevant cancer data. Methods: In 2023, CRDC chartered a Data Governance Framework of Committees and Working Groups across all CRDC components to: Enable diverse data type sharing; provide secure data access; optimize common infrastructure components and functions; and adhere to FAIR data principles (Findable, Accessible, Interoperable, and Reusable). We considered many designs for the CRDC Data Governance Framework, including fully centralizing or decentralizing decision-making bodies. Implementing a fully centralized data governance framework for a diverse data ecosystem like CRDC would risk top-down decisions that do not meet the unique needs of the CRDC and cancer research community. A fully decentralized data governance framework risks inconsistencies and compliance challenges. Additional governance design considerations included a federated approach that emphasizes clear roles and responsibilities; prioritizing cross-component policies to ensure consistency and promote operational efficiency; defining measurable performance standards; and keeping data owners, stewards, and users informed, trained, and supported. Results: We have determined that long term sustainment of CRDC requires a federated governance approach with broad stakeholder participation through Committees and Working Groups. Currently established CRDC governance committees include the Enterprise Architecture Review Team (EART), Submission Review Team, Data Standards Working Group, and Data Advisory Board consisting of cancer research domain experts, researchers, and skilled cloud engineers and developers. In 2023, the CRDC EART defined data lifecycle stages and identified technical optimization opportunities for submission coordination and indexing. The established governance groups are working to identify optimization opportunities for data quality checks, submission review workflows, study acceptance criteria, data submission reviews, and cross-CRDC emerging data standards requirements. Conclusions: Sustaining large, complex, cancer research data requires a federated governance approach with participation from all relevant stakeholder groups and cannot simply be top-down imposition of policies, standards, and procedures. The newly established CRDC federated governance framework will contribute to improving the long-term sustainability of the CRDC infrastructure and ensuring CRDC data drive meaningful research outcomes. Citation Format: Juergen Klenk, Dina Mikdadi, Chelsea Owens, Angela Maggio, Bhavani Singh, Eric Barner, Tanja Davidsen, Erika Kim. Managing large-scale cancer research data programs [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3568.
Read full abstract