Abstract

PurposePhenotype algorithms are central to performing analyses using observational data. These algorithms translate the clinical idea of a health condition into an executable set of rules allowing for queries of data elements from a database. PheValuator, a software package in the Observational Health Data Sciences and Informatics (OHDSI) tool stack, provides a method to assess the performance characteristics of these algorithms, namely, sensitivity, specificity, and positive and negative predictive value. It uses machine learning to develop predictive models for determining a probabilistic gold standard of subjects for assessment of cases and non-cases of health conditions. PheValuator was developed to complement or even replace the traditional approach of algorithm validation, i.e., by expert assessment of subject records through chart review. Results in our first PheValuator paper suggest a systematic underestimation of the PPV compared to previous results using chart review. In this paper we evaluate modifications made to the method designed to improve its performance. MethodsThe major changes to PheValuator included allowing all diagnostic conditions, clinical observations, drug prescriptions, and laboratory measurements to be included as predictors within the modeling process whereas in the prior version there were significant restrictions on the included predictors. We also have allowed for the inclusion of the temporal relationships of the predictors in the model. To evaluate the performance of the new method, we compared the results from the new and original methods against results found from the literature using traditional validation of algorithms for 19 phenotypes. We performed these tests using data from five commercial databases. ResultsIn the assessment aggregating all phenotype algorithms, the median difference between the PheValuator estimate and the gold standard estimate for PPV was reduced from −21 (IQR −34, −3) in Version 1.0 to 4 (IQR −3, 15) using Version 2.0. We found a median difference in specificity of 3 (IQR 1, 4.25) for Version 1.0 and 3 (IQR 1, 4) for Version 2.0. The median difference between the two versions of PheValuator and the gold standard for estimates of sensitivity was reduced from −39 (-51, −20) to −16 (-34, −6). ConclusionPheValuator 2.0 produces estimates for the performance characteristics for phenotype algorithms that are significantly closer to estimates from traditional validation through chart review compared to version 1.0. With this tool in researcher’s toolkits, methods, such as quantitative bias analysis, may now be used to improve the reliability and reproducibility of research studies using observational data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call