Abstract

We develop an open-source R package to implement tree-based scan statistics (TBSS) analyses. TBSS are data mining methods used by the United States Food and Drug Administration and the Centers for Disease Control. They simultaneously screen thousands of hierarchically aggregated outcomes to identify unsuspected adverse effects of drugs or vaccines, accounting for multiple comparisons. The general structure of TBSS is highly adaptable, with four essential components: (1) a hierarchical outcome structure, (2) a test statistic to be computed for each element of the hierarchy, (3) an algorithm to generate data replicates under a null distribution, and (4) observed outcomes at the lower level of the hierarchy. We encode the general TBSS framework in a convenient R package that offers user-friendly functions for the most used TBSS methods. To illustrate the performance of our software, we evaluated two examples of archetypical TBSS analyses previously analyzed using proprietary, closed-source TreeScan™ software. The first considers the risk of congenital malformations associated with first-trimester exposure to valproate, and the second compares exposure to newly prescribed canagliflozin with a dipeptidyl peptidase 4 inhibitor in adults affected by type 2 diabetes. The results of the original studies are replicated. The diffusion of an open-source implementation of TBSS can enhance innovation of TBSS methods and foster collaborations. We offer an intuitive R package implementing standard TBSS methods with accompanying tutorials. Our unified object-oriented implementation allows expert users to extend the framework, introduce new features, or enhance existing ones.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call