BackgroundAccurate identification of incident venous thromboembolism (VTE) for quality improvement and health services research is challenging. The purpose of this study was to evaluate the performance of a novel incident VTE phenotyping algorithm defined using standard terminologies, requiring three key indicators documented in the electronic health record (EHR): VTE diagnostic code, VTE-related imaging procedure code, and anticoagulant medication code. MethodsRetrospective chart reviews were conducted to assess the performance of the algorithm using a random sample of phenotype(+) and phenotype(−) diagnostic encounters from primary care practices and acute care sites affiliated with five hospitals across a large integrated care delivery system in Massachusetts. The performance of the algorithm was evaluated by calculating the positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity, using the phenotype(+) and phenotype(−) diagnostic encounters sample and target population data. ResultsBased on gold-standard manual chart review, the algorithm had a PPV of 95.2 % (95 % CI: 93.1–96.8 %), NPV of 97.1 % (95 % CI: 95.3–98.4 %), sensitivity of 91.7 % (95 % CI: 90.8–92.6 %), and specificity of 98.4 % (95 % CI: 98.1–98.6 %). The algorithm systematically misclassified a low number of specific types of encounters, highlighting potential areas for improvement. ConclusionsThis novel phenotyping algorithm offers an accurate approach for identifying incident VTE in general populations using EHR data and standard terminologies, and accurately identifies the specific encounter and date of diagnosis of the incident VTE. This approach can be used for measurement of incident VTE to drive quality improvement, research to expand the evidence, and development of quality metrics and clinical decision support to improve the diagnostic process.