Abstract

COVID-19 cases are exponentially increasing worldwide; however, its clinical phenotype remains unclear. Natural language processing (NLP) and machine learning approaches may yield key methods to rapidly identify individuals at a high risk of COVID-19 and to understand key symptoms upon clinical manifestation and presentation. Data on such symptoms may not be accurately synthesized into patient records owing to the pressing need to treat patients in overburdened health care settings. In this scenario, clinicians may focus on documenting widely reported symptoms that indicate a confirmed diagnosis of COVID-19, albeit at the expense of infrequently reported symptoms. While NLP solutions can play a key role in generating clinical phenotypes of COVID-19, they are limited by the resulting limitations in data from electronic health records (EHRs). A comprehensive record of clinic visits is required—audio recordings may be the answer. A recording of clinic visits represents a more comprehensive record of patient-reported symptoms. If done at scale, a combination of data from the EHR and recordings of clinic visits can be used to power NLP and machine learning models, thus rapidly generating a clinical phenotype of COVID-19. We propose the generation of a pipeline extending from audio or video recordings of clinic visits to establish a model that factors in clinical symptoms and predict COVID-19 incidence. With vast amounts of available data, we believe that a prediction model can be rapidly developed to promote the accurate screening of individuals at a high risk of COVID-19 and to identify patient characteristics that predict a greater risk of a more severe infection. If clinical encounters are recorded and our NLP model is adequately refined, benchtop virologic findings would be better informed. While clinic visit recordings are not the panacea for this pandemic, they are a low-cost option with many potential benefits, which have recently begun to be explored.

Highlights

  • COVID-19 cases are exponentially increasing worldwide; clinical COVID-19 phenotypes remain unclear

  • The existing applications of Natural language processing (NLP) and machine learning in medical diagnostics are based on a combination of structured and unstructured data recorded by clinicians in patients’ electronic health records (EHRs)

  • Using NLP and machine learning approaches, data on documented signs and symptoms in the EHR are already being used to identify clinical conditions [4]. Such NLP-based efforts are currently being applied to unstructured text data captured in the EHR from telehealth consultations to develop better screening tools for COVID-19 [5]

Read more

Summary

Limitations of EHR Data

This considerable degree of symptom heterogeneity reported among patients with COVID-19 can deter the accurate documentation of less frequently reported symptoms in the EHR. Documentation inaccuracies in electronic medical records are not a new phenomenon; an analysis of data from 105 clinics indicated that 90% of clinician notes had at least one error, including 636 documentation errors that accounted for 181 charted findings that did not take place and 455 findings that were not charted [6]. Data on such symptoms may not be accurately synthesized into patient records owing to the pressing need to treat patients in overburdened health care settings. A comprehensive record of the clinic visits is required—an audio recording may be the solution [9]

Clinical Phenotypes Based on Audio Recordings of Clinic Visits
Findings
Data From Beyond the Clinic
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call