IOOS QARTOD software (ioos_qc)


image.png

Mathew Biddle
mathew.biddle@noaa.gov
https://orcid.org/0000-0003-4897-1669
NOAA/NOS/IOOS


Materials available at https://github.com/MathewBiddle/WIO_workshop.

This presentation will demonstrate how to run ioos_qc on a time-series dataset. ioos_qc implements the Quality Assurance / Quality Control of Real Time Oceanographic Data (QARTOD).

Key Objectives of QARTOD¶

  • Establish authoritative QA/QC procedures for the U.S. IOOS core variables, as necessary, including detailed information about the sensors and procedures used to measure the variables.
  • Produce written manuals for these QA/QC procedures
  • From the list of individual QA/QC procedures and guidelines developed, define a baseline set of QA/QC procedures that can be used for certification of RCOOS data providers
  • Facilitate QA/QC integration with Global Ocean Observing System (GOOS) and other international ocean observation efforts
  • Engage the Federal Agencies and IOOS Regions that are part of, or contribute to, US IOOS who will use the established QA/QC procedure
  • Work efficiently, without duplication of effort, to facilitate the implementation of common QA/QC procedures amongst US IOOS Partners.

Let's go get these data¶

We will be using the water level data from a fixed station in Kotzebue, AK.

We will get the data from the AOOS ERDDAP server.

In [1]:
from erddapy import ERDDAP

e = ERDDAP(
    server="http://erddap.aoos.org/erddap/",
    protocol="tabledap"
)

e.dataset_id = "kotzebue-alaska-water-level"

e.constraints = {
    "time>=": "2018-09-05T21:00:00Z",
    "time<=": "2019-07-10T19:00:00Z",
}

Return data and metadata¶

In [3]:
import cf_xarray

data = e.to_xarray()

data
Out[3]:
<xarray.Dataset>
Dimensions:                                                 (timeseries: 1,
                                                             obs: 7241)
Coordinates:
    latitude                                                (timeseries) float64 ...
    longitude                                               (timeseries) float64 ...
    time                                                    (obs) datetime64[ns] ...
Dimensions without coordinates: timeseries, obs
Data variables:
    station                                                 (timeseries) object ...
    rowSize                                                 (timeseries) int32 ...
    z                                                       (obs) float64 0.0...
    sea_surface_height_above_sea_level_geoid_mhhw           (obs) float64 0.4...
    sea_surface_height_above_sea_level_geoid_mhhw_qc_agg    (obs) float64 1.0...
    sea_surface_height_above_sea_level_geoid_mhhw_qc_tests  (obs) float64 2.1...
Attributes: (12/54)
    cdm_data_type:                 TimeSeries
    cdm_timeseries_variables:      station,longitude,latitude
    contributor_email:             sales@stilltek.com,dugan@aoos.org,,feedbac...
    contributor_name:              Stillwater Technologies LLC,Alaska Ocean O...
    contributor_role:              collaborator,sponsor,contributor,processor
    contributor_role_vocabulary:   NERC
    ...                            ...
    station_id:                    100053
    summary:                       Timeseries data from 'Kotzebue, Alaska, Wa...
    time_coverage_end:             2019-07-10T19:00:00Z
    time_coverage_start:           2018-09-05T21:00:00Z
    title:                         Kotzebue, Alaska, Water Level
    Westernmost_Easting:           -162.566752
xarray.Dataset
    • timeseries: 1
    • obs: 7241
    • latitude
      (timeseries)
      float64
      ...
      _CoordinateAxisType :
      Lat
      actual_range :
      [66.895035 66.895035]
      axis :
      Y
      ioos_category :
      Location
      long_name :
      Latitude
      standard_name :
      latitude
      units :
      degrees_north
      array([66.895035])
    • longitude
      (timeseries)
      float64
      ...
      _CoordinateAxisType :
      Lon
      actual_range :
      [-162.566752 -162.566752]
      axis :
      X
      ioos_category :
      Location
      long_name :
      Longitude
      standard_name :
      longitude
      units :
      degrees_east
      array([-162.566752])
    • time
      (obs)
      datetime64[ns]
      ...
      _ChunkSizes :
      512
      _CoordinateAxisType :
      Time
      actual_range :
      [1.5361812e+09 1.5627852e+09]
      axis :
      T
      ioos_category :
      Time
      long_name :
      Time
      standard_name :
      time
      time_origin :
      01-JAN-1970 00:00:00
      array(['2018-09-05T21:00:00.000000000', '2018-09-05T22:00:00.000000000',
             '2018-09-05T23:00:00.000000000', ..., '2019-07-10T17:00:00.000000000',
             '2019-07-10T18:00:00.000000000', '2019-07-10T19:00:00.000000000'],
            dtype='datetime64[ns]')
    • station
      (timeseries)
      object
      ...
      cf_role :
      timeseries_id
      ioos_category :
      Identifier
      ioos_code :
      urn:ioos:station:com.axiomdatascience:100053
      long_name :
      Kotzebue, Alaska, Water Level
      short_name :
      kotzebue-alaska-water-level
      type :
      fixed
      array([''], dtype=object)
    • rowSize
      (timeseries)
      int32
      ...
      ioos_category :
      Identifier
      long_name :
      Number of Observations for this TimeSeries
      sample_dimension :
      obs
      array([7241])
    • z
      (obs)
      float64
      ...
      _ChunkSizes :
      495
      _CoordinateAxisType :
      Height
      _CoordinateZisPositive :
      up
      actual_range :
      [0. 0.]
      axis :
      Z
      ioos_category :
      Location
      long_name :
      Altitude
      positive :
      up
      standard_name :
      altitude
      units :
      m
      array([0., 0., 0., ..., 0., 0., 0.])
    • sea_surface_height_above_sea_level_geoid_mhhw
      (obs)
      float64
      ...
      _ChunkSizes :
      512
      actual_range :
      [-2.84988 4.163568]
      ancillary_variables :
      sea_surface_height_above_sea_level_geoid_mhhw_qc_agg sea_surface_height_above_sea_level_geoid_mhhw_qc_tests
      id :
      1000206
      ioos_category :
      Other
      long_name :
      Water Level
      platform :
      station
      short_name :
      sea_surface_height_above_sea_level
      standard_name :
      sea_surface_height_above_sea_level
      standard_name_url :
      http://mmisw.org/ont/cf/parameter/sea_surface_height_above_sea_level
      units :
      m
      vertical_datum :
      MHHW
      array([0.4785  , 0.442   , 0.4968  , ..., 0.09144 , 0.082296, 0.054864])
    • sea_surface_height_above_sea_level_geoid_mhhw_qc_agg
      (obs)
      float64
      ...
      _ChunkSizes :
      4096
      actual_range :
      [1 4]
      flag_meanings :
      PASS NOT_EVALUATED SUSPECT FAIL MISSING
      flag_values :
      [1 2 3 4 9]
      ioos_category :
      Other
      long_name :
      Water Level QARTOD Aggregate Quality Flag
      references :
      https://github.com/ioos/ioos_qc
      short_name :
      sea_surface_height_above_sea_level_qc_agg
      standard_name :
      aggregate_quality_flag
      array([1., 1., 1., ..., 1., 1., 1.])
    • sea_surface_height_above_sea_level_geoid_mhhw_qc_tests
      (obs)
      float64
      ...
      _ChunkSizes :
      512
      comment :
      11-character string with results of individual QARTOD tests. 1: Gap Test, 2: Syntax Test, 3: Location Test, 4: Gross Range Test, 5: Climatology Test, 6: Spike Test, 7: Rate of Change Test, 8: Flat-line Test, 9: Multi-variate Test, 10: Attenuated Signal Test, 11: Neighbor Test
      flag_meanings :
      PASS NOT_EVALUATED SUSPECT FAIL MISSING
      flag_values :
      [1 2 3 4 9]
      ioos_category :
      Other
      long_name :
      Water Level QARTOD Individual Tests
      references :
      https://github.com/ioos/ioos_qc
      short_name :
      sea_surface_height_above_sea_level_qc_tests
      standard_name :
      sea_surface_height_above_sea_level quality_flag
      array([2.147484e+09, 2.147484e+09, 2.147484e+09, ..., 2.147484e+09,
             2.147484e+09, 2.147484e+09])
  • cdm_data_type :
    TimeSeries
    cdm_timeseries_variables :
    station,longitude,latitude
    contributor_email :
    sales@stilltek.com,dugan@aoos.org,,feedback@axiomdatascience.com
    contributor_name :
    Stillwater Technologies LLC,Alaska Ocean Observing System (AOOS),NOAA Alaska-Pacific River Forecast Center (APRFC),Axiom Data Science
    contributor_role :
    collaborator,sponsor,contributor,processor
    contributor_role_vocabulary :
    NERC
    contributor_url :
    https://stilltek.com,http://www.aoos.org/,https://www.weather.gov/aprfc/,https://www.axiomdatascience.com
    Conventions :
    IOOS-1.2, CF-1.6, ACDD-1.3, NCCSV-1.0
    creator_country :
    USA
    creator_email :
    dggspubs@alaska.gov
    creator_institution :
    Alaska Division of Geological & Geophysical Surveys (AK-DGGS)
    creator_name :
    Alaska Division of Geological & Geophysical Surveys (AK-DGGS)
    creator_sector :
    gov_state
    creator_type :
    institution
    creator_url :
    http://dggs.alaska.gov/
    defaultDataQuery :
    sea_surface_height_above_sea_level_geoid_mhhw,z,time,sea_surface_height_above_sea_level_geoid_mhhw_qc_agg&time>=max(time)-3days
    Easternmost_Easting :
    -162.566752
    featureType :
    TimeSeries
    geospatial_lat_max :
    66.895035
    geospatial_lat_min :
    66.895035
    geospatial_lat_units :
    degrees_north
    geospatial_lon_max :
    -162.566752
    geospatial_lon_min :
    -162.566752
    geospatial_lon_units :
    degrees_east
    geospatial_vertical_positive :
    up
    geospatial_vertical_units :
    m
    history :
    Downloaded from Stillwater Technologies LLC at https://stilltek.com/cgi-bin/qrySBD.php?site=ktzb 2022-06-06T13:16:01Z https://stilltek.com/cgi-bin/qrySBD.php?site=ktzb 2022-06-06T13:16:01Z http://erddap.aoos.org/erddap/tabledap/kotzebue-alaska-water-level.ncCF?&time%3E=1536181200.0&time%3C=1562785200.0
    id :
    100053
    infoUrl :
    https://portal.aoos.org/#metadata/100053/station
    institution :
    Alaska Division of Geological & Geophysical Surveys (AK-DGGS)
    license :
    The data may be used and redistributed for free but is not intended for legal use, since it may contain inaccuracies. Neither the data Contributor, ERD, NOAA, nor the United States Government, nor any of their employees or contractors, makes any warranty, express or implied, including warranties of merchantability and fitness for a particular purpose, or assumes any legal liability for the accuracy, completeness, or usefulness, of this information.
    naming_authority :
    com.axiomdatascience
    Northernmost_Northing :
    66.895035
    platform :
    fixed
    platform_name :
    Kotzebue, Alaska, Water Level
    platform_vocabulary :
    http://mmisw.org/ont/ioos/platform
    processing_level :
    Level 2
    publisher_country :
    USA
    publisher_email :
    sales@stilltek.com
    publisher_institution :
    Stillwater Technologies LLC
    publisher_name :
    Stillwater Technologies LLC
    publisher_sector :
    industry
    publisher_type :
    institution
    publisher_url :
    https://stilltek.com
    references :
    http://dggs.alaska.gov/pubs/staff/jroverbeck,https://stilltek.com/cgi-bin/qrySBD.php?site=ktzb,https://stilltek.com/cgi-bin/qrySBD.php?site=ktzb,https://water.weather.gov/ahps2/hydrograph.php?wfo=pafg2&gage=kzta2,https://github.com/ioos/ioos_qc
    sourceUrl :
    https://stilltek.com/cgi-bin/qrySBD.php?site=ktzb
    Southernmost_Northing :
    66.895035
    standard_name_vocabulary :
    CF Standard Name Table v72
    station_id :
    100053
    summary :
    Timeseries data from 'Kotzebue, Alaska, Water Level' (kotzebue-alaska-water-level)
    time_coverage_end :
    2019-07-10T19:00:00Z
    time_coverage_start :
    2018-09-05T21:00:00Z
    title :
    Kotzebue, Alaska, Water Level
    Westernmost_Easting :
    -162.566752
In [4]:
data.cf
Out[4]:
Coordinates:
- CF Axes:   X: ['longitude']
             Y: ['latitude']
             T: ['time']
             Z: n/a

- CF Coordinates:   longitude: ['longitude']
                    latitude: ['latitude']
                    time: ['time']
                    vertical: n/a

- Cell Measures:   area, volume: n/a

- Standard Names:   latitude: ['latitude']
                    longitude: ['longitude']
                    time: ['time']

- Bounds:   n/a

Data Variables:
- Cell Measures:   area, volume: n/a

- Standard Names:   aggregate_quality_flag: ['sea_surface_height_above_sea_level_geoid_mhhw_qc_agg']
                    altitude: ['z']
                    sea_surface_height_above_sea_level: ['sea_surface_height_above_sea_level_geoid_mhhw']
                    sea_surface_height_above_sea_level quality_flag: ['sea_surface_height_above_sea_level_geoid_mhhw_qc_tests']

- Bounds:   n/a
In [5]:
data.cf.axes
Out[5]:
{'X': ['longitude'], 'Y': ['latitude'], 'T': ['time']}
In [6]:
data.cf.coordinates
Out[6]:
{'longitude': ['longitude'], 'latitude': ['latitude'], 'time': ['time']}
In [7]:
data.cf.standard_names
Out[7]:
{'latitude': ['latitude'],
 'longitude': ['longitude'],
 'time': ['time'],
 'altitude': ['z'],
 'sea_surface_height_above_sea_level': ['sea_surface_height_above_sea_level_geoid_mhhw'],
 'aggregate_quality_flag': ['sea_surface_height_above_sea_level_geoid_mhhw_qc_agg'],
 'sea_surface_height_above_sea_level quality_flag': ['sea_surface_height_above_sea_level_geoid_mhhw_qc_tests']}

Let's plot the raw data¶

In [8]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(15, 3.75))

data.cf.plot.scatter('time','sea_surface_height_above_sea_level', ax=ax, s=5)

ax.grid(True)

Build the QC configuration¶

Below we create a simple Quality Assurance/Quality Control (QA/QC) configuration that will be used as input for ioos_qc. All the interval values are in the same units as the data.

For more information on the tests and recommended values for QA/QC check the documentation of each test and its inputs: https://ioos.github.io/ioos_qc/api/ioos_qc.html#module-ioos_qc.qartod

Manual for Real-Time Quality Control of Water Level Data

In [9]:
qc_config = {
    "qartod": {
        
      "gross_range_test": {
            "suspect_span": [ -2,  3],
            "fail_span":    [-10, 10]
      },
        
      "flat_line_test": {
            "tolerance":         0.001,
            "suspect_threshold": 10800,
            "fail_threshold":    21600
      },
        
      "spike_test": {
            "suspect_threshold": 0.8,
            "fail_threshold":      3,
      }
    }
}
In [10]:
nice_print(qc_config)
{
  "qartod": {
    "gross_range_test": {
      "suspect_span": [
        -2,
        3
      ],
      "fail_span": [
        -10,
        10
      ]
    },
    "flat_line_test": {
      "tolerance": 0.001,
      "suspect_threshold": 10800,
      "fail_threshold": 21600
    },
    "spike_test": {
      "suspect_threshold": 0.8,
      "fail_threshold": 3
    }
  }
}

For flat_line_test:

  • 10800 seconds = 3 hours
  • 21600 seconds = 6 hours

Run the QC tests with the supplied configuration¶

In [11]:
from ioos_qc.config import QcConfig

qc = QcConfig(qc_config)

variable_name = data.cf.standard_names["sea_surface_height_above_sea_level"][0]

qc_results =  qc.run(
    inp=data[variable_name],
    tinp=data.cf["time"],
)

nice_print(qc_results['qartod'])
{
  "gross_range_test": "[1 1 1 ... 1 1 1]",
  "flat_line_test": "[1 1 1 ... 1 1 1]",
  "spike_test": "[2 1 1 ... 1 1 2]"
}

The results are returned in a dictionary format, similar to the input configuration, with a mask for each test. The results range from 1 to 4 meaning:

flag meaning
1 data passed the QA/QC
2 did not run on this data point
3 flag as suspect
4 flag as failed

Let's look at the gross_range test results¶

The gross range test test should fail data outside the $\pm$ 10 range and suspect data below -2, and greater than 3. As one can easily see all the major spikes are flagged as expected.

In [13]:
plot_results(
    data,
    variable_name,
    qc_results,
    title,
    "gross_range_test"
)
In [14]:
qc_config['qartod']['gross_range_test']
Out[14]:
{'suspect_span': [-2, 3], 'fail_span': [-10, 10]}

Let's look at the spike test results¶

An actual spike test, based on a data increase threshold, flags similar spikes to the gross range test but also indetifies other suspect unusual increases in the series.

In [15]:
plot_results(
    data,
    variable_name,
    qc_results,
    title,
    "spike_test"
)
In [16]:
qc_config['qartod']['spike_test']
Out[16]:
{'suspect_threshold': 0.8, 'fail_threshold': 3}

Let's look at the flat_line test results¶

The flat line test identifies issues with the data where values are "stuck."

ioos_qc succefully identified a huge portion of the data where that happens and flagged a smaller one as suspect. (Zoom in the red point to the left to see this one.)

In [17]:
plot_results(
    data,
    variable_name,
    qc_results,
    title,
    "flat_line_test"
)
In [18]:
qc_config['qartod']['flat_line_test']
Out[18]:
{'tolerance': 0.001, 'suspect_threshold': 10800, 'fail_threshold': 21600}

What tests are currently available in ioos_qc for QARTOD?¶

In [19]:
import ioos_qc

for func in dir(ioos_qc.qartod):
    if "test" in func: 
        print(func)
attenuated_signal_test
climatology_test
density_inversion_test
flat_line_test
gross_range_test
location_test
rate_of_change_test
spike_test

Where can you find more information?¶

  • IOOS CodeLab example: https://ioos.github.io/ioos_code_lab/content/code_gallery/data_analysis_and_visualization_notebooks/2020-02-14-QARTOD_ioos_qc_Water-Level-Example.html
  • Example notebooks: https://github.com/ioos/ioos_qc/tree/master/docs/source/examples
  • Source documentation: https://ioos.github.io/ioos_qc/
  • QARTOD manuals: https://ioos.noaa.gov/project/qartod/

Thank you!

image.png

Mathew Biddle
Mathew.Biddle@noaa.gov
https://orcid.org/0000-0003-4897-1669
NOAA/NOS/IOOS


Materials available at https://github.com/MathewBiddle/WIO_workshop.

This notebook was adapt from Jessica Austin and Kyle Wilcox's original ioos_qc examples. Please see the ioos_qc documentation for more examples.