Utilizing Data from Various Data Partners in a Distributed Manner

Project Title Utilizing Data from Various Data Partners in a Distributed Manner
Date Posted
Tuesday, May 22, 2018
Status
Complete
Deliverables
Description

This project focused on developing a stable, feasible approach to enable secure distributed linear, logistic, and Cox regression analysis within a distributed data network while not requiring sharing of any patient-level datasets from the participating data partners. Distributed regression analysis (DRA) enables data partners to maintain control of patient-level data while generating valid regression estimates across the network.

This page includes the following:

  • DRA SAS packages: Two SAS packages used to run DRA, one for Data Partners and one for the analysis center. The packages include all algorithms for linear, logistic, and cox regression.
  • Linear DRA Sample Report: Report generated by %create_grep_rpt for distributed linear regression analysis with the partitioned Boston Housing dataset
  • Logistic DRA Sample Report: Report generated by %create_grep_rpt for distributed logistic regression analysis with the partitioned Boston Housing dataset
  • Cox DRA Sample Report 1: Report generated by %create_cox_grep_rpt for distributed Cox regression analysis with the partitioned Maryland convict dataset
  • Cox DRA Sample Report 2: Report generated by %create_cox_grep_rpt for distributed stratified (Data Partner site identifier) Cox regression analysis with the partitioned Maryland convict dataset
  • SAS-based DRA application documentation: Documentation of the DRA algorithms and set up of our SAS-based DRA application for execution in a horizontally partitioned distributed data network
  • Boston Housing Data [1]: Zip file of the Boston Housing dataset, and the three partitioned datasets used for distributed linear and logistic regression analysis testing
    Original data can be found here.
  • Maryland State Prison Data [2]: Zip file of the Maryland State Prison dataset, and the three partitioned datasets used for distributed cox regression analysis testing
    Original data can be found here.

___________________

[1] Harrison D, Rubinfeld DL. Hedonic housing prices and the demand for clean air. Journal of environmental economics and management. 1978;5(1):81-102.

[2] Rossi PH, Henry JP. Seriousness: A measure for all purposes. Handbook of criminal justice evaluation. 1980:489-505.

Workgroup Leader(s)

Darren Toh ScD; Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA

Workgroup Members

Qoua Her PharmD; Jessica Malenfant MPH; Yury Vilk PhD; Jessica Young PhD; Zilu Zhang MSc; Sarah Malek MPPA; Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA

Related Links