Statistical Methods to Harmonize Electronic Health Record Data Across Healthcare Systems: Case Study and Lessons Learned

Details

Basic Details

Date

Monday, March 2, 2026

Type

Publication

Description

Although common data models for electronic health record (EHR) data can facilitate multi-site data organization and querying, the same medical event may still be coded differently between healthcare systems. In this paper, we present statistical methods to identify and mitigate coding discrepancies using summary-level data and demonstrate these methods using data from two FDA Sentinel data partners: Kaiser Permanente Washington and Kaiser Permanente Northwest. We first characterize differences in coding patterns, then compute a code mapping matrix to harmonize data between systems. Our findings reveal significant heterogeneity in coded EHR data, even after adopting a common data model with the same coding system, highlighting the importance of data harmonization before downstream analyses. Our study also demonstrates the effectiveness of the data harmonization approaches which provide a foundational data quality step to promote semantic interoperability, enhance data integration, and improve the integrity of study conclusions

Materials

Bioinformatics. 2026 Mar 02 doi.org/10.1093/bioinformatics/btag107

Contributors

Author(s)

Xu Shi, Yuqi Zhai, Xianshi Yu, Xiaoou Li, Brian L. Hazlehurst, Denis B. Nyongesa, Daniel S. Sapp, Brian D. Williamson, David S. Carrell, Luesa Healy, Kara L. Cushing-Haugen, Jenna Wong, Shirley V. Wang, James S. Floyd, Kathleen Shattuck, Samuel McGown, Sarah Alam, José J. Hernández-Muñoz, Jie Li, Yong Ma, Danijela Stojanovic, Sudha R. Raman, Sharon E. Davis, Tianxi Cai, Jennifer C. Nelson, Patrick J. Heagerty

Corresponding Author

Xu Shi; University of Michigan

Email: shixu@umich.edu