Data Quality Review and Characterization

The Sentinel Operations Center's method of reviewing and characterizing the quality of Sentinel data is a detailed and step-wise process that involves a coordinated effort between each Data Partner and the Sentinel Operations Center (SOC). Data quality refers to both the characteristics associated with and the processes used to measure or improve the quality of data. Data is considered high quality when it meets pre-specified standards and is fit for its intended use.

Graphic depicting Sentinel's data quality review and characterization process.

1. The data quality review and characterization process begins with the initiation of each new ETL (the practice of Data Partners Extracting, Transforming, and Loading new data into the Sentinel Common Data Model). The SOC first prepares an updated quality review and characterization package designed to assess the Data Partner's new ETL data.

2. Meanwhile, the Data Partner transforms their source data into the Sentinel Common Data Model so it can be included in the Sentinel Distributed Database. 

3. The SOC then distributes the quality review and characterization package to the Data Partners to run on their source data.

4. The Data Partner receives the quality review and characterization package and runs it against their data. This package is comprised of over 900 distinct data checks. These checks are both automated (programmatic checks that have to be resolved in order for the package to finish running) and semi-automated (programmatic checks that are evaluated by the Data Partner after the automatic checks are complete) and can be categorized into two types: Level 1 and Level 2 checks. These checks identify errors (systematic transformation issues) or anomalies (unexpected data behavior). After the program finishes running successfully, the quality review and characterization package creates an output of flags. On average, there are 44 flags the program identifies per ETL. The Data Partner reviews and comments on these flags, in some cases necessitating the rerunning the package until certain issues are resolved. The Data Partner then sends the output and a report of the identified flags and any pertinent comments to the SOC.

5. The SOC receives the output and report from the Data Partner and reviews. Alongside this, the SOC completes the series of quality review and characterization checks. These checks are semi-automated and manual in nature and can be categorized into three types: Level 2, Level 3, and Level 4 checks. These comprise over 500 distinct data checks. On average, there are 10 additional flags per ETL identified by the SOC. The SOC then evaluates these additional flags and creates an issue report for the Data Partner to address.

6. The Data Partner receives the report from SOC, investigates the issues, and resolves any remaining flags.

7. The SOC then confirms all flags have been resolved and approves the ETL for integration into the Sentinel Distributed Database.

Illustration of Sentinel different types of data quality checks.

Additional Resources

Sentinel Common Data Model

Sentinel Data Quality Assurance Practices