Abstract
138 Background: Cancer care is multi-disciplinary in nature and relationships between healthcare providers may potentially influence outcomes. Yet understanding how patient care or outcomes are affected by these relationships has been hindered by the paucity of temporal and multi-relational data on physicians. We demonstrate a method for uniquely identifying and linking providers across multiple databases longitudinally. Methods: We identified unique, individual healthcare providers in Medicare, Medicaid, and private payer data in North Carolina (NC) from 2003-2014. In order to link the providers between the different identifiers (e.g., NPI, UPIN, etc.), five provider data sets were obtained. These databases included the National Plan and Provider Enumeration System (NPPES), the Medicare Physician Identification and Eligibility Records (MPIER), the NC State Medicaid provider file, private payer provider files and the NC medical license file. Identification and linking was performed using a novel approach leveraging relational database tools in Oracle and then verified via a second approach applying network analysis in Python. Sub-set validation was performed using claims from cancer cohorts in NC with continuous enrollment in multiple payers. Results: There was significant variation in data quality as well as temporal and geographic overlap between datasets. Linking across all data resulted in a Cartesian product of over 50 billion combinations. This was overcome by aligning provider identifiers under unique combinations of matching variables (given name, last name, zip code and specialty). From all five datasets, approximately 158,000 unique physicians were identified. In subset validation, the NPI and UPIN matches agreed 99-100% with the 2008 Medicare professional claims in a cohort of NC cancer patients. The NPI and private provider ID matches agreed between 73-79% of the time for those cancer patients with claims aligned in both payers. However, only about 30% of the Medicaid provider IDs were individually attributable and matched. Conclusions: Providers can be uniquely identified and matched across disparate databases enabling us to measure and contrast provider networks and their variation across payers and systems.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have