Leveraging Electronic Health Records and Claims Data to Improve HIV and Comorbidity Care Trajectories: A Scoping Review.
Big Data sources, specifically electronic health records (EHR) and insurance claims data, are key in advancing HIV research. This scoping review summarizes recent research using EHR/claims to understand the evolving relationship between HIV and comorbidities. Data sources ranged from individual health system EHR to multi-system integrated datasets. Datasets that linked insurance claims or EHR with external sources (e.g. public health HIV surveillance, social systems) had the richest findings. PLWH who maintained care for HIV and comorbidities, including COVID-19, had similar health outcomes to peers living without HIV. Mental health, substance use disorders, and HPV-related cancers remain prevalent in PLWH. HIV stigma and racial disparities in non-HIV comorbidity care were detected. These findings reinforce evidence of improving general health for PLWH as research and evidence-based treatment progress, and the utility of Big Data for PLWH in public health emergencies like COVID-19. There is continued need for tailored interventions for co-morbid mental health and some cancers. Linking EHR/claims data to external sources are critical to research and practice innovations in approaching whole-person care on the path to HIV elimination.
- Research Article
13
- 10.1001/jamanetworkopen.2020.2875
- Apr 15, 2020
- JAMA Network Open
Opioid-tolerant only (OTO) medications, such as transmucosal immediate-release fentanyl products and certain extended-release opioid analgesics, require prior opioid tolerance for safe use, as patients without tolerance may be at increased risk of overdose. Studies using insurance claims have found that many patients initiating these medications do not appear to be opioid tolerant. To measure prevalence of opioid tolerance in patients initiating OTO medications and to determine whether linked electronic health record (EHR) data contribute evidence of opioid tolerance not found in insurance claims data. This retrospective cohort study used a national database of deidentified longitudinal health information, including medical and pharmacy claims, insurance enrollment, and EHR data, from January 1, 2007, to December 31, 2016. Data included 131 756 US residents with at least 183 days of continuous enrollment in commercial or Medicare Advantage insurance (including medical and pharmacy benefits) who had received an OTO medication and who had no inpatient stays in the 30 days prior to starting an OTO medication; of these, 20 044 individuals had linked EHR data within the prior 183 days. Data were analyzed from July 1, 2017, to August 31, 2018. Initiating an OTO medication. Prior opioid tolerance demonstrated through pharmacy fills or EHR data on prescriptions written. Among 153 385 OTO use episodes identified, 89 029 (58.0%) occurred among women, 62 900 (41.0%) occurred among patients with Medicare Advantage insurance, 39 394 (25.7%) occurred in the Midwest, 17 366 (11.3%) occurred in the Northeast, 73 316 (47.8%) occurred in the South, and 23 309 (15.2%) occurred in the West. Less than half of use episodes (73 117 episodes [47.7%]) involved patients with evidence in claims data of opioid tolerance prior to initiating therapy with an OTO medication, including 31 392 of 101 676 episodes (30.9%) involving transdermal fentanyl, 1561 of 2440 episodes (64.0%) involving transmucosal fentanyl, 36 596 of 43 559 episodes (84.0%) involving extended-release oxycodone, and 3568 of 5710 episodes (62.5%) involving extended-release hydromorphone. Among 20 044 OTO use episodes with linked EHR and claims data, less than 1% of OTO episodes identified in claims had evidence of opioid tolerance in structured EHR data that was not present in claims data (108 episodes [0.5%]). After limiting the sample to OTO episodes identified in claims with a matching OTO prescription within 14 days in the structured EHR data, only 40 of 939 episodes (4.0%) occurred among patients with evidence of tolerance that was not present in claims data. This cohort study found that most patients initiating OTO medications did not have evidence of prior opioid tolerance, suggesting they were at increased risk of opioid-related harms, including fatal overdose. Data from EHRs did not contribute substantial additional evidence of opioid tolerance beyond the data found in prescription claims. Future research is needed to understand the clinical rationale behind these observed prescribing patterns and to quantify the risk of harm to patients associated with potentially inappropriate prescribing.
- Research Article
3
- 10.1111/ppe.12971
- Mar 23, 2023
- Paediatric and perinatal epidemiology
Rigour and reproducibility in perinatal and paediatric epidemiologic research using big data
- Research Article
3
- 10.1093/aje/kwae226
- Jul 16, 2024
- American journal of epidemiology
Electronic health record (EHR) data are seen as an important source for pharmacoepidemiology studies. In the US health-care system, EHR systems often identify only fragments of patients' health information across the care continuum, including primary care, specialist care, hospitalizations, and pharmacy dispensing. This leads to unobservable information in longitudinal evaluations of medication effects, causing unmeasured confounding, misclassification, and truncated follow-up times. A remedy is to link EHR data with longitudinal health insurance claims data, which record all encounters during a defined enrollment period across all care settings. Here we evaluate EHR and claims data sources in 3 aspects relevant to etiological studies of medical products: data continuity, data granularity, and data chronology. Reflecting on the strengths and limitations of EHR and insurance claims data, it becomes obvious that they complement each other. The combination of both will improve the validity of etiological studies and expand the range of questions that can be answered. As the research community transitions towards a future state with access to large-scale combined EHR+claims data, we outline analytical templates to improve the validity and broaden the scope of pharmacoepidemiology studies in the current environment where EHR data are available only for a subset of patients with claims data. This article is part of a Special Collection on Pharmacoepidemiology.
- Research Article
- 10.1016/j.schres.2025.06.024
- Sep 1, 2025
- Schizophrenia research
Construction of extra-large scale screening tools for risks of severe mental illnesses using real world healthcare data.
- Research Article
21
- 10.1097/phh.0b013e31821f2d73
- May 1, 2012
- Journal of Public Health Management and Practice
Public health surveillance systems for acute hepatitis are limited: clinician reporting is insensitive and electronic laboratory reporting is nonspecific. Insurance claims and electronic health records are potential alternative sources. To compare the utility of laboratory data, diagnosis codes, and electronic health record combination data (current and prior viral hepatitis studies, liver function tests, and diagnosis codes) for acute hepatitis A and B surveillance. Retrospective chart review. Massachusetts ambulatory practice serving 350 000 patients per year. All patients seen between 1990 and 2008. Sensitivity and positive predictive value of immunoglobulin M (IgM), International Classification of Disease-Ninth Revision (ICD-9) diagnosis codes, and combination electronic health record data for acute hepatitis A and B. During the study period, there were 111 patients with positive hepatitis A IgMs, 154 with acute hepatitis A ICD-9 codes, and 77 with positive IgM and elevated liver function tests. On review, 79 cases were confirmed. Sensitivity and positive predictive value were 100% and 71% (95% confidence interval, 62%-79%) for IgM, 94% (92%-100%) and 48% (40%-56%) for ICD-9 codes and 97% (92%-100%) and 100% (96%-100%) for combination electronic health record data. There were 14 patients with positive hepatitis B core IgMs, 2564 with acute hepatitis B ICD-9 codes, and 125 with suggestive combinations of electronic health record data. Acute hepatitis B was confirmed in 122 patients. Sensitivity and positive predictive value were 9.4% (5.2%-16%) and 86% (60%-98%) for hepatitis B core IgM, 73% (65%-80%) and 3.6% (2.9%-4.4%) for ICD-9 codes, and 96% (91%-99%) and 98% (94%-99%) for electronic health record data. Laboratory surveillance using IgM tests overestimates the burden of acute hepatitis A and underestimates the burden of acute hepatitis B. Claims data are subject to many false positives. Electronic health record data are both sensitive and predictive. Electronic health record-based surveillance systems merit development.
- Research Article
- 10.1161/circ.146.suppl_1.13803
- Nov 8, 2022
- Circulation
Introduction: Pragmatic randomized controlled trials (RCT) often use multiple data sources to examine clinical events, but the relative contribution of data sources (patient-reported, electronic health records (EHR), private/public claims data) to clinical endpoint rates is often not examined. Hypothesis: We hypothesized that claims data would contribute the highest number of events relative to other data sources. Methods: The ADAPTABLE study was an open-label, pragmatic RCT that demonstrated no significant difference in major cardiovascular or bleeding events among patients with CV disease randomized to an aspirin dose of 81vs. 325 mg/daily. We assessed the clinical endpoint rates and contribution of clinical endpoints by data source among patients who had EHR, public and private insurance claims data, and/or patient-reported (portal) data. Results: Of 15,076 patients randomized, there were 1,412 patients with EHR-only data; 8,756 with portal and EHR data; and 4,291 patients with portal, EHR, and claims data (Medicare or private). Patients with EHR-only data were younger (63.7 years) compared to the other groups (65.6-71.1 years); and were more likely Black (10.6%) vs. the other groups (6.3%-9.7%), p<0.001. The table demonstrates trial clinical event rates by data source among patients with available portal, EHR, and claims data. Among patients with available portal, EHR, and claims data, Medicare claims data contributed the most events for the composite endpoint (77.5%) and all-cause death (97.3%), while EHR data contributed the most events for MI (74.3%), stroke (74.3%), and major bleed (73.3%). Conclusions: In a large pragmatic trial, Medicare claims data contributed the most clinical events for the primary composite outcome and all-cause death, when compared with other available data sources. Further work is needed to understand the data source combinations that most effectively provide clinical endpoint data in RCTs.
- Abstract
1
- 10.23889/ijpds.v7i3.2090
- Aug 25, 2022
- International Journal of Population Data Science
ObjectiveThe All of Us Research Program (AoURP) is an ambitious effort to gather health data from one million Americans to accelerate research. We linked Electronic Health Records (EHR) and insurance claims data to characterize the degree to which ancillary datasets can improve data completeness for care received by AoURP participants. ApproachWe sought to link EHR data for 400,000 consented AoURP participants with insurance claims data provided by IPM.AI (Swoop Analytics), a commercial analytics company who have insurance claims data for 300M (over 90% of) Americans. We utilized a HIPAA-compliant privacy-preserving record linkage method (tokenization, provided by Datavant) to match patients between datasets. We evaluated match fidelity and the degree of overlap between AoURP EHRs and IPM.AI claims data. We characterized the association of patient and organizational level factors (demographics, healthcare provider organization, reporting site) with match performance. ResultsAs of submission of this abstract, 41% of AoURP EHRs matched with IPM.AI claims. We compared patient healthcare encounters, diagnosis codes (DX), procedure codes (PX), and national drug codes (NDC) for matched patients by month. The union of AoU and IPM.AI data greatly increased data completeness in matched patients. Only 20% of healthcare encounters were seen by AoURP and IPM.AI concurrently while 25% were unique to AoU EHRs and 55% to IPM.AI claims on a monthly level. The number of diagnosis events compared between AoURP and IPM.AI is roughly equal (AoU +6%) while procedure events are elevated in claims data (23%) and drug counts are greatly elevated in AoURP EHR data (71%). We found that matched patients had more healthcare encounters compared to unmatched patients. ConclusionTo our knowledge this is the first effort to address challenges in AoURP data completeness through complementary data linkage. Our results suggest that supplementary data linkage can improve data completeness in a large national research initiative. We identified several patient factors that require further investigation in improving match fidelity.
- Research Article
2
- 10.1002/pds.5734
- Dec 19, 2023
- Pharmacoepidemiology and drug safety
Observational studies assessing effects of medical products on suicidal behavior often rely on health record data to account for pre-existing risk. We assess whether high-dimensional models predicting suicide risk using data derived from insurance claims and electronic health records (EHRs) are superior to models using data from insurance claims alone. Data were from seven large health systems identified outpatient mental health visits by patients aged 11 or older between 1/1/2009 and 9/30/2017. Data for the 5 years prior to each visit identified potential predictors of suicidal behavior typically available from insurance claims (e.g., mental health diagnoses, procedure codes, medication dispensings) and additional potential predictors available from EHRs (self-reported race and ethnicity, responses to Patient Health Questionnaire or PHQ-9 depression questionnaires). Nonfatal self-harm events following each visit were identified from insurance claims data and fatal self-harm events were identified by linkage to state mortality records. Random forest models predicting nonfatal or fatal self-harm over 90 days following each visit were developed in a 70% random sample of visits and validated in a held-out sample of 30%. Performance of models using linked claims and EHR data was compared to models using claims data only. Among 15 845 047 encounters by 1 574 612 patients, 99 098 (0.6%) were followed by a self-harm event within 90 days. Overall classification performance did not differ between the best-fitting model using all data (area under the receiver operating curve or AUC = 0.846, 95% CI 0.839-0.854) and the best-fitting model limited to data available from insurance claims (AUC = 0.846, 95% CI 0.838-0.853). Competing models showed similar classification performance across a range of cut-points and similar calibration performance across a range of risk strata. Results were similar when the sample was limited to health systems and time periods where PHQ-9 depression questionnaires were recorded more frequently. Investigators using health record data to account for pre-existing risk in observational studies of suicidal behavior need not limit that research to databases including linked EHR data.
- Research Article
11
- 10.1016/j.jpainsymman.2021.04.012
- Apr 29, 2021
- Journal of pain and symptom management
The Serious Illness Population: Ascertainment via Electronic Health Record or Claims Data
- Research Article
- 10.1200/jco.2023.41.16_suppl.6514
- Jun 1, 2023
- Journal of Clinical Oncology
6514 Background: Clinical RWD derived from EHRs is becoming increasingly important for clinical research, trial design, regulatory decisions etc. These applications require identification of lines of therapy (LoT) which are typically not captured in EHR and must be abstracted from other clinical and medication data. EHR data has significant missingness which can be complemented with other data sources such as medical claims data. In this study, we demonstrate how our proprietary line of therapy algorithms for solid cancers show significant improvements when built using integrated EHR and claims data when compared to EHR data alone. Methods: For this analysis, ConcertAI’s RWD360 dataset integrated with a large administrative open-claims dataset (>90% overlap) for 14 solid cancer indications (Breast, Bladder, Lung, Prostate, Pancreas, Melanoma, Liver, Head & Neck, Renal, Colorectal, Melanoma, Ovarian, Thyroid, Endometrial) was used. The date of advanced/metastatic diagnosis used as the index date for LoTs was derived from the EHR data and medications from both EHR and claims data were used. We ran our LoT algorithms on EHR data with and without claims data and evaluated the impact of integrating claims data on the quantity and quality of LoT output. Results: The inclusion of medication data from claims significantly increased (7-22%) the number of patients for which LoTs could be extracted from the EHR data. Furthermore, we observed increases in number of lines per patient, length of lines and medications per line across cohorts. The distance between index date and 1st line start date was shortened in a subset (2-12%) of patients as a result. In a small fraction of cases, we even observed removal of false lines as some of the lines moved to adjuvant/neoadjuvant setting by filling in missing medication from claims. Overall, 7-39% patients in the LoT cohorts were impacted by addition of claims. Results for a few cancer types are presented in Table 1. We also compared the top LoTs derived from the integrated dataset against the standard of care for that cancer and observed very good concordance. Conclusions: Deriving LoTs by integrating data from multiple data sources such as EHR and claims can significantly improve its accuracy. [Table: see text]
- Abstract
- 10.1093/ofid/ofac492.1436
- Dec 15, 2022
- Open Forum Infectious Diseases
BackgroundKorea has single health insurance system and insurance claim information on almost all medical practices in Korean hospitals is collected and processed by the Health Insurance Review and Assessment Service (HIRA). Since information about prescription of almost all hospitals is available in National Health Insurance (NHI) claim data, recently established the Korea National Antimicrobial Use Analysis System (KONAS) has been using NHI claim data as data source. The purpose of this study is to validate the accuracy of NHI claim data.MethodsData on all antimicrobial agents prescribed in four tertiary-care hospitals in Korea between January 2019 and December 2019 were obtained using NHI claim data extracted by HIRA and data extracted by common data model based on electronic health record (EHR) in each hospital. Antibiotics and antifungal agents according to the Anatomical Therapeutic Chemical class J01 and J02 were included while antiviral, antitubercular, antiparasitic, and topical antimicrobial agents were excluded. Antimicrobial consumption was measured as days of therapy (DOT) and standardized to per 1,000 patient-days. The ratio of monthly antimicrobial consumption calculated using the NHI claim data compared to that calculated using the common data model was demonstrated (HIRA/EHR ratio).ResultsThe monthly HIRA/EHR ratio of broad-spectrum antibiotics predominantly used for hospital-onset infections was 1.08-1.12 and that of broad-spectrum antibiotics predominantly used for community-acquired infections was 1.11-1.21. The monthly HIRA/EHR ratio of other antimicrobial classes are as follows: antibacterial agents predominantly used for resistant gram-positive infections 1.15-1.31, narrow-spectrum beta-lactam agents 1.00-1.05, antifungal agents predominantly used for invasive candidiasis 1.00-1.27, and antibacterial agents predominantly used for extensive antibiotic-resistant gram-negative bacteria 0.70-1.09.ConclusionThe monthly antimicrobial consumption calculated using NHI claim data differs from that calculated using EHR data by up to 30%. It would be desirable to establish a system that can analyze and monitor antimicrobial consumption using EHR data in each hospital in Korea in the future.DisclosuresHyunki Woo, BS, Evidnet Inc.: Employee changhui Kim, BS, Evidnet Inc.: Employee.
- Front Matter
10
- 10.1016/j.adaj.2015.09.002
- Oct 26, 2015
- The Journal of the American Dental Association
Taking a byte out of big data
- Research Article
6
- 10.3389/fphar.2022.845949
- Apr 4, 2022
- Frontiers in Pharmacology
Objective: To evaluate the continuity and completeness of electronic health record (EHR) data, and the concordance of select clinical outcomes and baseline comorbidities between EHR and linked claims data, from three healthcare delivery systems in Taiwan. Methods: We identified oral hypoglycemic agent (OHA) users from the Integrated Medical Database of National Taiwan University Hospital (NTUH-iMD), which was linked to the National Health Insurance Research Database (NHIRD), from June 2011 to December 2016. A secondary evaluation involved two additional EHR databases. We created consecutive 90-day periods before and after the first recorded OHA prescription and defined patients as having continuous EHR data if there was at least one encounter or prescription in a 90-day interval. EHR data completeness was measured by dividing the number of encounters in the NTUH-iMD by the number of encounters in the NHIRD. We assessed the concordance between EHR and claims data on three clinical outcomes (cardiovascular events, nephropathy-related events, and heart failure admission). We used individual comorbidities that comprised the Charlson comorbidity index to examine the concordance of select baseline comorbidities between EHRs and claims. Results: We identified 39,268 OHA users in the NTUH-iMD. Thirty-one percent (n = 12,296) of these users contributed to the analysis that examined data continuity during the 6-month baseline and 24-month follow-up period; 31% (n = 3,845) of the 12,296 users had continuous data during this 30-month period and EHR data completeness was 52%. The concordance of major cardiovascular events, nephropathy-related events, and heart failure admission was moderate, with the NTU-iMD capturing 49–55% of the outcome events recorded in the NHIRD. The concordance of comorbidities was considerably different between the NTUH-iMD and NHIRD, with an absolute standardized difference >0.1 for most comorbidities examined. Across the three EHR databases studied, 29–55% of the OHA users had continuous records during the 6-month baseline and 24-month follow-up period. Conclusion: EHR data continuity and data completeness may be suboptimal. A thorough evaluation of data continuity and completeness is recommended before conducting clinical and translational research using EHR data in Taiwan.
- Research Article
1
- 10.1101/2024.12.13.24319010
- Dec 16, 2024
- medRxiv : the preprint server for health sciences
Do recent changes in European Society of Human Reproduction and Embryology (ESHRE) clinical guidelines result in more comprehensive diagnosis of women with endometriosis? The latest shift in clinical guidelines results in diagnosis of more women with endometriosis but current ESHRE diagnostic criteria do not capture a sizable percentage of women with the disease. Historically, laparoscopy was the gold standard for diagnosing endometriosis, a complex gynecological condition marked by a heterogeneous set of symptoms that vary widely among women. More recently, changes in clinical guidelines have shifted to incorporate imaging-based approaches such as transvaginal sonography and magnetic resonance imaging. Retrospective, observational cohort study of women aged 15-49 years diagnosed with endometriosis in the United States (US) between January 1, 2013, and December 31, 2023. Data sources include US insurance claims data from the Merative™ MarketScan® Commercial Database (CCAE), Merative™ MarketScan® Multi-State Medicaid Database (MDCD), Optum® de-identified Electronic Health Record dataset (Optum® EHR), and electronic health record (EHR) data from a large academic medical center in New York City (CUIMC EHR). To examine the potential impact of expanding clinical criteria for the diagnosis of endometriosis, we validated and compared five cohort definitions based on different sets of diagnostic guidelines involving combinations of surgical confirmation, diagnostic imaging, guideline-recognized symptoms, and other symptoms commonly reported among women with endometriosis. We performed pairwise comparisons between cohorts and applied Bonferroni corrections to account for multiple comparisons. We identified 491,048 women with a diagnosis of endometriosis across the CCAE, MDCD, Optum EHR, and CUIMC EHR datasets. Each cohort definition demonstrated strong positive predictive value (0.84-0.96), yet only 15-20% of cases were identified by all 5 cohort definitions. Women diagnosed with endometriosis based on imaging and symptoms were three years younger, on average, than women with a diagnosis based on surgical confirmation (mean age = 35 years [SD = 9] vs 38 years [SD = 8]; p<0.001). Women in cohorts based only on symptoms were two years younger than those based on surgery (36 years [SD = 8] vs 38 years [SD = 8]; p<0.001). More than one-fourth of cases presented with endometriosis-related symptoms but lacked surgical or imaging-related documentation required by ESHRE guideline criteria. Pain was reported among nearly all women with endometriosis. Abdominal pain and pain in the pelvis were the most prevalent (ranging from 69% to 90% of women in each cohort). Among approximately 2-5% of all endometriosis cases (14,795 total), women presented with pelvic and/or abdominal pain but none of the other symptoms noted in clinical guidelines. Our study has potential biases associated with documentation practices and secondary data use of insurance claims and EHR data. Further, the phenotyping algorithms used rely on clinical codes that do not necessarily capture all ESHRE diagnostic criteria for endometriosis and may not be generalizable to women with atypical presentation of endometriosis. High positive predictive value among all five cohort definitions despite poor overlap among cases identified illustrates both the heterogeneous presentation of the disease and importance of expanding diagnostic criteria. For example, cohorts derived from updated guidelines identified younger patients at time of diagnosis. Women diagnosed based on imaging had higher rates of emergency room visits while patients diagnosed via laparoscopy had a larger number of hospitalizations. The substantial number of cases with pelvic and/or abdominal pain but none of the other symptoms noted in clinical guidelines underscores the continued need for improved access to timely and appropriate care, particularly among those with non-classical symptoms, different care-seeking patterns, or lack of available surgical intervention.
- Research Article
12
- 10.1089/pop.2020.0306
- Feb 5, 2021
- Population Health Management
Multiple indices are available to measure medication adherence behaviors. Medication adherence measures, however, have rarely been extracted from electronic health records (EHRs) for population-level risk predictions. This study assessed the value of medication adherence indices in improving predictive models of cost and hospitalization. This study included a 2-year retrospective cohort of patients younger than age 65 years with linked EHR and insurance claims data. Three medication adherence measures were calculated: medication regimen complexity index (MRCI), medication possession ratio (MPR), and prescription fill rate (PFR). The authors examined the effects of adding these measures to 3 predictive models of utilization: a demographics model, a conventional model (Charlson index), and an advanced diagnosis-based model. Models were trained using EHR and claims data. The study population had an overall MRCI, MPR, and PFR of 14.6 ± 17.8, .624 ± .310, and .810 ± .270, respectively. Adding MRCI and MPR to the demographic and the morbidity models using claims data improved forecasting of next-year hospitalization substantially (eg, AUC of the demographic model increased from .605 to .656 using MRCI). Nonetheless, such boosting effects were attenuated for the advanced diagnosis-based models. Although EHR models performed inferior to claims models, adding adherence indices improved EHR model performances at a larger scale (eg, adding MRCI increased AUC by 4.4% for the Charlson model using EHR data compared to 3.8% using claims). This study shows that medication adherence measures can modestly improve EHR- and claims-derived predictive models of cost and hospitalization in non-elderly patients; however, the improvements are minimal for advanced diagnosis-based models.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.