Abstract The Health Cyberinfrastructure (CI) Division at the San Diego Supercomputer Center (SDSC) has been deploying secure, compliant, end-to-end solutions to support critical applications including, but not limited to, biomedical research, detecting and preventing medical fraud, enterprise risk management, and medical device data management. These applications have relied on technologies ranging from the traditional data warehouse to the more recent big data platforms leveraging Hadoop, Software as a Service (SaaS) capability leveraging container technology, business intelligence and analytics solutions, and IOT/streaming solutions. Operating these applications and the underlying platforms requires organizations to be agile and visionary in this arena to grow with the ever-changing technological, regulatory, and unique customer requirements, while simultaneously ensuring data security and privacy. To that end, the SDSC Health CI Division is participating in a multi-institution project with City of Hope, a National Cancer Institute (NCI)-designated Comprehensive Cancer Center, and other universities and health organizations to create a research cyberinfrastructure that includes a secure, cloud-based data management platform. The data management platform consolidates all CTS data, including datasets, codes, and accompanying documentation, and operates within Sherlock Cloud, a secure hybrid cloud platform. The innovative platform allows every member of the CTS team to securely access and use all CTS data and information in real time in a consolidated, integrated, and secure manner. Researchers have access to a data warehouse, domain-specific data marts, and analytics platform built on secure cloud technology that greatly enhances the reliability and accuracy of the data collected, and has a more seamless mechanism to access, annotate, input, and transmit data, thereby heightening accuracy and the quality of collaborative analysis performed. This approach not only transforms how CTS data is collected, stored, and shared for high-impact research, but also has the potential to reduce associated costs of ongoing research while increasing efficiency and security. SDSC Health CI Division’s work in building innovative data management and compute solutions for health care researchers over the last decade, and its collaboration with the CTS researchers more recently, has demonstrated a clear need for a unique type of managed services capability that serves the biomedical community. Specifically, the epidemiology cohorts have practiced a more traditional approach consisting of decentralized data management and computing. Even though this approach has served the community well in the past, as we look into the future, it has obvious shortcomings in terms of supporting enterprise-level data management, scale, interoperability, and provenance. SDSC Health CI Division takes a modern approach to information lifecycle management through its Sherlock Cloud platform. Sherlock Cloud Data Management framework captures users’ evolving requirements, and provides centralized, secure cloud-based managed services capability that supports end-to-end data lifecycle management including data integration capabilities, allowing data to be captured, rationalized, homogenized, and managed using best practices and standards. This includes dynamic mappings, transformations, and master data-management techniques utilizing established governance methodologies, with specific focus on data quality, metadata management, and governance. The CTS data management and analytics platform implementation can serve as a model for other epidemiology cohorts and studies. SDSC Health CI Division plans to achieve this by leveraging the existing investment NCI and CTS has already made in building the cloud-based, secure CTS data management platform. Using container technology, Sherlock will package the various components of the data management framework and analytical tools, creating a turnkey offering that can be deployed for other similar studies. This cloud platform agnostic, open-source, turnkey solution provides other cohorts a template they can build on top of, and leverage all the work CTS has performed to develop the core, domain-specific data management and analytics capability, and all that will remain is study specific customizations that cohorts will need to perform. Additionally, APIs built on top of the in-built data model will provide seamless integration with external NCI- and NIH-established Commons. The epidemiology community has a way to go before it is on par with communities in other verticals in their adoption of cutting-edge, enterprise platforms and tools for data management and analytics, but initiatives like CTS can demonstrate that, once designed and developed, these capabilities can be easily packaged and deployed for other studies, providing a framework that a larger research community can leverage. Citation Format: Sandeep Chandra. California Teachers Study (CTS) Data Management Platform: A model for a repeatable turnkey, end-to-end, cloud-based data management and analytics solution for epidemiology cohorts [abstract]. In: Proceedings of the AACR Special Conference on Modernizing Population Sciences in the Digital Age; 2019 Feb 19-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2020;29(9 Suppl):Abstract nr IA20.
Read full abstract