Abstract

BackgroundA distributed data network approach combined with distributed regression analysis (DRA) can reduce the risk of disclosing sensitive individual and institutional information in multicenter studies. However, software that facilitates large-scale and efficient implementation of DRA is limited.ObjectiveThis study aimed to assess the precision and operational performance of a DRA application comprising a SAS-based DRA package and a file transfer workflow developed within the open-source distributed networking software PopMedNet in a horizontally partitioned distributed data network.MethodsWe executed the SAS-based DRA package to perform distributed linear, logistic, and Cox proportional hazards regression analysis on a real-world test case with 3 data partners. We used PopMedNet to iteratively and automatically transfer highly summarized information between the data partners and the analysis center. We compared the DRA results with the results from standard SAS procedures executed on the pooled individual-level dataset to evaluate the precision of the SAS-based DRA package. We computed the execution time of each step in the workflow to evaluate the operational performance of the PopMedNet-driven file transfer workflow.ResultsAll DRA results were precise (<10−12), and DRA model fit curves were identical or similar to those obtained from the corresponding pooled individual-level data analyses. All regression models required less than 20 min for full end-to-end execution.ConclusionsWe integrated a SAS-based DRA package with PopMedNet and successfully tested the new capability within an active distributed data network. The study demonstrated the validity and feasibility of using DRA to enable more privacy-protecting analysis in multicenter studies.

Highlights

  • Background and SignificanceDistributed regression analysis (DRA) is a suite of methods that perform multivariable regression analysis in multicenter studies without the need for pooling individual-level data [1,2]

  • We demonstrate the feasibility of using the SAS-based distributed regression analysis (DRA) package and PopMedNet-driven file transfer workflow to perform DRA in a real-world horizontally partitioned distributed data network DRA (DDN)

  • We considered the integration successful if the DRA parameter estimates and SEs and model fit statistics were precise to the results from the corresponding pooled individual-level data analyses (10−6)

Read more

Summary

Introduction

Background and SignificanceDistributed regression analysis (DRA) is a suite of methods that perform multivariable regression analysis in multicenter studies without the need for pooling individual-level data [1,2]. There have been efforts to develop capabilities that coordinate and automate the iterative computation and file transfer process of DRA to make it a more practical analytical option in real-world multicenter studies [4,5,6,7,8,9,10,11] These efforts have focused primarily on the programming language R and specially designed applications (eg, Java applets) to facilitate semiautomated or fully automated file transfers between the data partners and the analysis center [7,8,9,10,11]. The study demonstrated the validity and feasibility of using DRA to enable more privacy-protecting analysis in multicenter studies

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.