Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Quick Reference

Quality Assurance Quality Control (QAQC) Flags

...

Clean Data: As described in the quality control option data product option, the clean option will filter out poor quality data and will optionally replace it with null values if the fill data gaps option is selected. (In most data products, null values are represented by 'NaN', which stands for Not-A-Number.) Poor quality data is qualified as quality control flags 3 and 4 as defined above.

Raw Data: No filtering or modification is done to the data. All the quality control flags are reported.

Quality Assurance Quality Control Overview

One of the problems facing a real-time oceanographic observatory is the ability to provide a fast and accurate assessment of the data quality assurance and quality control (QAQC). Ocean Networks Canada is in the process of implementing measures of real-time quality control on incoming scalar data that meet the guidelines of the Quality Assurance of Real Time Oceanographic Data (QARTOD) group. QARTOD is a US organization tasked with identifying issues involved with incoming real-time data from the U.S Integrated Ocean Observing System (IOOS). A large portion of their agenda is to create guidelines for how the quality of real-time data is to be determined and reported to the scientific community. ONC is striving to adhere to the QARTOD’s ‘Seven Laws of Data Management’ to provide trusted data to the scientific community.

Note that engineering data (e.g. instrument voltages, groundfaults etc.) are excluded from quality controls.

QARTOD’s Seven Laws of Data Management:

  • Every real-time observation distributed to the ocean community must be accompanied by a quality descriptor.
  • All observations should be subject to some level of automated real-time quality test.
  • Quality flags and quality test descriptions must be sufficiently described in the accompanying metadata.
  • Observers should independently verify or calibrate a sensor before deployment.
  • Observers should describe their method / calibration in the real-time metadata.
  • Observers should quantify the level of calibration accuracy and the associated expected error bounds.
  • Manual checks on the automated procedures, the real-time data collected and the status of the observing system must be provided by the observer on a timescale appropriate to ensure the integrity of the observing system.

QAQC Implementation

The quality control testing is split into 3 separate categories. The first category is in real-time and tests the data before the data are parsed into the database. The second category is delayed-mode testing where archived data are subject to testing after a certain period of time. The third category is manual quality control by an ONC data expert.

...

In addition to quality control testing, data may be annotated. Annotations are general purpose comments by ONC data experts, scientists and operations staff to note and explain times of service interruption or data issues. Annotations are compiled into the metadata reports that accompany all data products. Prior to or independent of requesting a data product, users may also search the annotation database directly using the annotations search page, however, that method is not as effective as clicking on the links in step 2 of data search.

How does ONC determine the final quality control flag?

All data are passed through each level of testing to create a quality control vector containing the output for each test (i.e. multiple QC flags per datum). The overall output quality control flag is determined from the set of QC flags for each datum as follows:

  • If passed all tests, the final output flag assigned is 1 (Data passed all tests).
  • If there are real-time automatic test failures, the most severe flag is assigned.
  • If there are manual quality control flags they override the automatic tests with the most severe manual flag applying (there should not be any cases of multiple manual flags).

How do you determine which tests have been applied to the data you downloaded?

In the accompanying metadata, there is a section called Data Quality Information that contains all the information regarding quality control for the requested data. Quality control test information is based on device and is listed, if available, along with the valid time period of the test as well as the values used in the formula. Also listed in this section are time gaps greater than 15 minutes in duration.

Terminology

Major Test

A test that sets gross limits on the incoming data such as instrument manufacturer’s specifications or climatological values. If failed, we recommend that the flagged data not be used.

...

A test that sets local limits on the incoming data such as site-level values based on statistics and dual-sensor testing to catch conductivity cell plugs. If failed, the data are considered suspect and require further attention by the user to decide whether or not to include this data in their analysis.

Real-time Quality Control Tests

Instrument Level: Tests at this level are to determine whether the data meet manufacturer’s range specifications for each sensor. Failure of this test level is considered major and is likely due to sensor failure or a loss of calibration. We are currently working on entering calibration and configuration to the database so that it can be returned in the metadata.

...

Region Level: Similar to Observatory level tests the tests at this level help to eliminate extreme values in water properties that are not associated with the overall region. Failure of this test is considered major and could be due to sensor drift, bio fouling etc. Minimum/maximum values are chosen based on years of data at various sites that are considered to be of good quality.

Single-Sensor Testing

...

Range: Minimum/maximum values for this level of testing originate in the statistics of the previous years of data. The limits are set as +/- 3 standard deviations about the mean without considering seasonal effects. Failure of this test minor as it could mean a little seen water mass or short term bio-fouling such as a plugged conductivity cell. Further testing is provided to determine whether a failure is a plugged conductivity cell.

Dual-Sensor Testing

...

Temperature-Conductivity Testing: This specialized test is designed to catch a dropout in conductivity that are not necessarily outside the site level range given by the single-sensor testing. This is a dual-sensor test that uses both the temperature and conductivity sensors of a single device to determine whether the conductivity data are good. Dropouts in conductivity are very apparent in a T vs C plot and can be flagged relatively easily using a simple equation. Failure of this test is minor. All derived sensors (salinity, density and sigma-T) inherit the quality flag from this test.

Delayed-Mode Testing

Spike Testing: Designed to identify implausible spikes in scalar data. (Applies most specifically to CTD data). Requires 3 consecutive values, so test results are slighty delayed and may be reprocessed if the data come in out of order.

...

Stuck Value Testing: Flags data that remains identically the same after predetermined length of time. Results will be delayed by the tolerance time, may be reprocessed if the data come in out of order.

Data Product Testing

In the generation of data products, processing steps that modify the data need to propagate the QAQC test results on the raw data through to the final result. There is one test here, although it is more of a procedure than a test:

Data Completeness Test:  All resampling regularizes the data into an array of time stamps, each one representing the centre of a time period. If the amount of data falling in that time period is insufficient to compute a reliable resampled value (average, min/max, interpolation), then the final QAQC flag will be modified: if the amount of data available is greater than 70%, then QAQC flag 7 is assigned for averaging, QAQC flag 8 is assigned for interpolation, or, for for min/max resampling, the QAQC flag of the minimum or maximum value is kept; if the amount of data is less than 70% and more than 0% the final QAQC flag will be 6; if all the data in the resample period has been cleaned (it has a flag of 3 or 4 and the clean and NaN options have been selected), then the final flag is 4.

Manual Quality Control

All ONC data undergo some form of manual quality control to assure the user that an expert is regularly checking the data. If real-time or delayed tests do not pick up an entire episode of ‘bad’ data, manual quality control will. This overrides all tests.