Abstract

<h3>Purpose/Objective(s)</h3> Healthcare data often exist in silos and in unstructured formats that limit interoperability and require tedious manual extraction. Our institution has adopted a flexible and scalable big data platform built on Hadoop that integrates data from Epic/Clarity as well as Aria and allows users to leverage modern data science tools to facilitate access. We hypothesize that a data analytics and visualization dashboard can be built using open-source tools that will (1) allow non-technical users to explore de-identified clinical data within our institutional big data platform and (2) connect with repositories of molecular data to demonstrate potential methods of integrating clinical and basic science data. <h3>Materials/Methods</h3> De-identified patient-level radiation oncology data from the institutional big data platform (Hadoop) were extracted with the python packages pyodbc and pandas. For the purposes of this dashboard, radiation oncology specific clinical data elements were queried including the date of first radiation treatment, treatment location, treatment modality (SBRT, external beam, SRS, TBI, LDR/HDR brachytherapy), ICD10 codes, anatomic treatment site, number of fractions, treatment prescription, and dose per fraction. A python client connection with the publicly accessible instance of cBioPortal for Cancer Genomics was established using the Bravado library. Data transformation and cleaning was performed in python using panda's data frames. A web-based dashboard to facilitate user-defined visualizations was implemented using the Dash python library and interactive visualizations of subsets of extracted data were generated in real-time using the plotly plotting library. <h3>Results</h3> We developed a web-based dashboard that gives users without extensive programming expertise the ability to explore de-identified clinical data extracted from Hadoop. As proof of principle, the dashboard was used to visualize the clinical impact of the COVID-19 pandemic on radiation oncology patient volumes, revealing a significant decline in new radiation treatments in April and May of 2020 (-54% and -36% compared to 2019) during the initial COVID-19 surge. Furthermore, the dashboard allows users to interact with the cBioPortal for Cancer Genomics repository, which currently houses clinical and molecular data from 301 publicly available studies spanning 869 different cancer types. This interface with cBioPortal illustrates the potential for future integration of clinically meaningful sequencing results with clinical outcomes data. <h3>Conclusion</h3> We built an interactive web-based dashboard to enable general users' easy access to de-identified clinical data stored within the institutional big data platform. Additional data sources, including external molecular data can be connected to the dashboard allowing for future integration.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call