Abstract The goal of the described work was to create a self-service tool for data exploration and cohort creation, facilitating feasibility assessments for research studies and providing insights into the primary clinical and demographic characteristics of the Moffitt patient population. The Moffitt Cancer Analytics Platform (MCAP) was designed to enable seamless end-to-end data lifecycle management, promote data democratization, and reduce data duplication by feeding diverse, typically siloed local data streams into a central data repository. Eight distinct sources including the Moffitt electronic medical record, cancer registry, billing systems and clinical trial management system feed into an Amazon S3-based data lake. Raw data are cleaned, standardized for research use, and stored in the enterprise data warehouse in Snowflake. To date, over 80 distinct tables and 2,200 data elements are captured, representing over 500,000 patients. To provide transparency into these data assets, Moffitt partnered with phData to create a custom tool for data self-service, MCAP Explore. Built on the Sigma cloud analytic platform, Explore is a highly customized interactive workbook that allows the user to filter the total patient population on a broad range of clinical and demographic characteristics to identify cohorts of interest, with applications ranging from research study feasibility assessments to operational oversight of the composition of the patient population and where disparities may exist in disease characteristics, treatment or outcomes. In addition, the user is able to view a broad range of visualizations summarizing the resulting cohort and deep-dive into the specific records available. The features that distinguish Explore from similar tools include: a) the broad range of data domains covered by the available filters (demographics, diagnoses, appointments, labs/vitals, treatment, study enrollment/consent, patient-reported, biospecimens, molecular); b) the ability to apply any number of filters in an arbitrary order, on data that is both one-to-one and many-to-one with the patient, while maintaining important linkages between record types; c) full customization of the product for our oncology-specific use cases, facilitating transparency into the underlying data model and flexibility for future expansion; d) no data being required to leave the Moffitt environment, and de-identification processes governed at an institutional level and applied within Snowflake being automatically inherited within Explore by connection to our institutional Active Directory accounts and Snowflake user roles. We will present detailed examples of tool functionality and describe the underlying data model and design decisions required to accommodate a broad range of complex oncology-specific use cases. In addition, we will present early usage metrics and next steps for expansion into new data domains including imaging. Citation Format: Rachel Howard, Phillip Reisman, Patricia Lewis, Rodrigo Carvajal, Chandan Challa, Mark Ruesink, Joe McFarren, Katrina Johnson, Mukund Sridhar, Kedar Kulkarni, Dana E. Rollison. MCAP Explore: A self-service data exploration and cohort building tool for oncologists [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 2068.
Read full abstract