Distributed Regression Analysis: Utilizing Data from Various Data Partners in a Distributed Manner

    Basic Details
    Date Posted
    Friday, December 14, 2018
    Status
    Complete
    Description

    This project focused on developing a stable, feasible approach to enable secure distributed linear, logistic, and Cox regression analysis within a distributed data network while not requiring sharing of any patient-level datasets from the participating data partners. Distributed regression analysis (DRA) enables data partners to maintain control of patient-level data while generating valid regression estimates across the network.

    This page includes the following:

    • Final Report: Final project report detailing methods, results, and conclusions.
    • SAS-based DRA Application: Two SAS packages used to run DRA, one for Data Partners and one for the analysis center. The packages include all algorithms for linear, logistic, and cox regression.
    • SAS-based DRA Application Documentation: Documentation of the DRA algorithms and set up of our SAS-based DRA application for execution in a horizontally partitioned distributed data network.
    • SAS-based DRA Application (for testing): Two SAS packages used to test the SAS-based DRA Application, one for Data Partners and one for the analysis center. The packages include all algorithms for linear, logistic, and cox regression and also a macro that mimics the actions of a data sharing software for internal testing.
    • Test Data: Zip file of the Boston Housing [1] and Maryland State Prison [2] datasets, and the three partitioned datasets used for distributed linear, logistic, and Cox proportional hazards regression analysis testing with the SAS-based DRA application. The original Boston Housing dataset can be found here and the original Maryland State Prison data can be found here.
    • Linear DRA Sample Report: Report generated by %create_grep_rpt for distributed linear regression analysis with the partitioned Boston Housing dataset.
    • Logistic DRA Sample Report: Report generated by %create_grep_rpt for distributed logistic regression analysis with the partitioned Boston Housing dataset.
    • Cox DRA Sample Report 1: Report generated by %create_cox_grep_rpt for distributed Cox regression analysis with the partitioned Maryland convict dataset.
    • Cox DRA Sample Report 2: Report generated by %create_cox_grep_rpt for distributed stratified (Data Partner site identifier) Cox regression analysis with the partitioned Maryland convict dataset.

    ___________________

    [1] Harrison D, Rubinfeld DL. Hedonic housing prices and the demand for clean air. Journal of environmental economics and management. 1978;5(1):81-102.

    [2] Rossi PH, Henry JP. Seriousness: A measure for all purposes. Handbook of criminal justice evaluation. 1980:489-505.

    Workgroup Leader(s)

    Darren Toh, ScD; Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA

    Michael Nguyen, MD; Office of Surveillance and Epidemiology, Center for Drug and Evaluation Research, U.S. Food and Drug Administration, Silver Spring, MD

    Workgroup Member(s)

    Qoua Her, PharmD; Jessica Malenfant, MPH; Yury Vilk, PhD; Jessica Young, PhD; Zilu Zhang, MSc; Sarah Malek, MPPA; Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA