Skip to main content

Characterization and Comparison of Structured and Unstructured Electronic Health Record Data Mapped to MedDRA for Post-Marketing Surveillance

    Basic Details
    Date
    Description

    Medical product safety surveillance efforts, whether using electronic health record (EHR) or claims data, typically rely on structured codes. Utilizing unstructured EHR data, particularly information extracted from clinical text through natural language processing (NLP), enriches information available for data mining, phenotyping, and surveillance. To assess overlapping and distinct information across structured and unstructured EHR data, we mapped both to a common vocabulary (Medical Dictionary for Regulatory Activities, MedDRA). We assess the feasibility of implementing such a mapping and explored similarities and differences at multiple levels of the concept hierarchy.

    We randomly sampled 15,000 encounters (5,000 each from ambulatory, emergency, and inpatient encounters). For each encounter, we extracted MedDRA concepts from clinical notes using MetaMap and mapped structured International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnoses to MedDRA. We evaluated corroboration between data sources across the MedDRA hierarchy, as well as the unique information contributed by each source.

    Author(s)

    Joshua C. Smith, Sharon E. Davis, Ruth M. Reeves, Robert Winter, Jill Whitaker, Daniel Park, Shirley V. Wang, Massimiliano Russo, Judith C. Maro, José J. Hernández-Muñoz, Yong Ma, Youjin Wang, Jamal T. Jones, Rishi J. Desai, Michael E. Matheny

    Corresponding Author

    Joshua C. Smith; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States.

    Email: joshua.c.smith@vumc.org