e23316 Background: Real world data analyses are critical to assess drug safety and effectiveness, including in patient populations underrepresented in clinical trials. Significant barriers to performing such analyses remain, including: (1) incomplete access to electronic health record (EHR) data due to patient privacy concerns, (2) lack of data harmonization within and across EHRs, (3) requirements for time- and labor-intensive data curation efforts, and (4) necessity of coding or data science expertise given the absence of user-friendly platforms. We present Clinical nSights, a software platform that addresses these challenges to enable researchers and clinicians to perform rapid, robust, and scalable real world oncology analyses. Methods: With input EHR data from multiple health systems, we developed intuitive user interfaces and a scalable researcher friendly programming environment for real world clinical analytics. We used (1) nCognito, a previously described HIPAA-compliant de-identification algorithm, to de-identify EHRs across data modalities; (2) the combination of a de novo knowledge graph, vector and distribution similarity metrics, and manual review to harmonize data types including but not limited to laboratory tests, disease codes, and drugs; and (3) a series of Bidirectional Encoder Representations from Transformers (BERT) models to perform augmented curation of variables of interest (e.g., cancer staging, tumor biomarkers) from unstructured clinical records. We developed a user interface for users to create and analyze defined patient cohorts. Results: Clinical nSights facilitates intuitive cohort creation via flexible specification of inclusion and exclusion criteria, including options to perform temporal linking between criteria and propensity score matching. The ability to specify criteria documented in de-identified clinical notes expands the repertoire of available features for cohort creation beyond those captured in structured EHR fields, including but not limited to cancer stage and tumor biomarker status (e.g., HER2 status, microsatellite instability). Data harmonization enables scalable longitudinal analyses of laboratory test data across cancer types, including changes in measurements in the years leading up to and following initial cancer diagnosis. Augmented curation improves the coverage of phenotypes including immune-related toxicities of checkpoint inhibitors and symptoms experienced by patients prior to a cancer diagnosis. Using the no-code interface, users can easily quantify and visualize outcomes such as overall survival and hospitalization rates. Conclusions: Clinical nSights addresses important challenges in the collection, curation, and analysis of real world oncology data. By enabling a broad range of users to rapidly generate and test hypotheses, this platform has the potential to significantly improve care delivery and patient outcomes.
Read full abstract