Distributed Regression Analysis: Utilizing Data from Various Data Partners in a Distributed Manner

Project Title Distributed Regression Analysis: Utilizing Data from Various Data Partners in a Distributed Manner
Date Posted
Friday, December 14, 2018
Status
Complete
Deliverables
Description

This project focused on developing a stable, feasible approach to enable secure distributed linear, logistic, and Cox regression analysis within a distributed data network while not requiring sharing of any patient-level datasets from the participating data partners. Distributed regression analysis (DRA) enables data partners to maintain control of patient-level data while generating valid regression estimates across the network.

This page includes the following:

  • Final Report: Final project report detailing methods, results, and conclusions.
  • SAS-based DRA Application: Two SAS packages used to run DRA, one for Data Partners and one for the analysis center. The packages include all algorithms for linear, logistic, and cox regression.
  • SAS-based DRA Application Documentation: Documentation of the DRA algorithms and set up of our SAS-based DRA application for execution in a horizontally partitioned distributed data network.
  • SAS-based DRA Application (for testing): Two SAS packages used to test the SAS-based DRA Application, one for Data Partners and one for the analysis center. The packages include all algorithms for linear, logistic, and cox regression and also a macro that mimics the actions of a data sharing software for internal testing.
  • Test Data: Zip file of the Boston Housing [1] and Maryland State Prison [2] datasets, and the three partitioned datasets used for distributed linear, logistic, and Cox proportional hazards regression analysis testing with the SAS-based DRA application. The original Boston Housing dataset can be found here and the original Maryland State Prison data can be found here.
  • Linear DRA Sample Report: Report generated by %create_grep_rpt for distributed linear regression analysis with the partitioned Boston Housing dataset.
  • Logistic DRA Sample Report: Report generated by %create_grep_rpt for distributed logistic regression analysis with the partitioned Boston Housing dataset.
  • Cox DRA Sample Report 1: Report generated by %create_cox_grep_rpt for distributed Cox regression analysis with the partitioned Maryland convict dataset.
  • Cox DRA Sample Report 2: Report generated by %create_cox_grep_rpt for distributed stratified (Data Partner site identifier) Cox regression analysis with the partitioned Maryland convict dataset.

___________________

[1] Harrison D, Rubinfeld DL. Hedonic housing prices and the demand for clean air. Journal of environmental economics and management. 1978;5(1):81-102.

[2] Rossi PH, Henry JP. Seriousness: A measure for all purposes. Handbook of criminal justice evaluation. 1980:489-505.

Workgroup Leader(s)

Darren Toh, ScD; Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA

Michael Nguyen, MD; Office of Surveillance and Epidemiology, Center for Drug and Evaluation Research, U.S. Food and Drug Administration, Silver Spring, MD

Workgroup Members

Qoua Her, PharmD; Jessica Malenfant, MPH; Yury Vilk, PhD; Jessica Young, PhD; Zilu Zhang, MSc; Sarah Malek, MPPA; Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA

Related Links
Scroll to Top