Mathew Biddle
mathew.biddle@noaa.gov
https://orcid.org/0000-0003-4897-1669
NOAA/NOS/IOOS
Materials available at https://github.com/MathewBiddle/WIO_workshop.
This presentation will demonstrate how to run ioos_qc
on a time-series dataset. ioos_qc
implements the Quality Assurance / Quality Control of Real Time Oceanographic Data (QARTOD).
We will be using the water level data from a fixed station in Kotzebue, AK.
We will get the data from the AOOS ERDDAP server.
from erddapy import ERDDAP
e = ERDDAP(
server="http://erddap.aoos.org/erddap/",
protocol="tabledap"
)
e.dataset_id = "kotzebue-alaska-water-level"
e.constraints = {
"time>=": "2018-09-05T21:00:00Z",
"time<=": "2019-07-10T19:00:00Z",
}
import cf_xarray
data = e.to_xarray()
data
<xarray.Dataset> Dimensions: (timeseries: 1, obs: 7241) Coordinates: latitude (timeseries) float64 ... longitude (timeseries) float64 ... time (obs) datetime64[ns] ... Dimensions without coordinates: timeseries, obs Data variables: station (timeseries) object ... rowSize (timeseries) int32 ... z (obs) float64 0.0... sea_surface_height_above_sea_level_geoid_mhhw (obs) float64 0.4... sea_surface_height_above_sea_level_geoid_mhhw_qc_agg (obs) float64 1.0... sea_surface_height_above_sea_level_geoid_mhhw_qc_tests (obs) float64 2.1... Attributes: (12/54) cdm_data_type: TimeSeries cdm_timeseries_variables: station,longitude,latitude contributor_email: sales@stilltek.com,dugan@aoos.org,,feedbac... contributor_name: Stillwater Technologies LLC,Alaska Ocean O... contributor_role: collaborator,sponsor,contributor,processor contributor_role_vocabulary: NERC ... ... station_id: 100053 summary: Timeseries data from 'Kotzebue, Alaska, Wa... time_coverage_end: 2019-07-10T19:00:00Z time_coverage_start: 2018-09-05T21:00:00Z title: Kotzebue, Alaska, Water Level Westernmost_Easting: -162.566752
array([66.895035])
array([-162.566752])
array(['2018-09-05T21:00:00.000000000', '2018-09-05T22:00:00.000000000', '2018-09-05T23:00:00.000000000', ..., '2019-07-10T17:00:00.000000000', '2019-07-10T18:00:00.000000000', '2019-07-10T19:00:00.000000000'], dtype='datetime64[ns]')
array([''], dtype=object)
array([7241])
array([0., 0., 0., ..., 0., 0., 0.])
array([0.4785 , 0.442 , 0.4968 , ..., 0.09144 , 0.082296, 0.054864])
array([1., 1., 1., ..., 1., 1., 1.])
array([2.147484e+09, 2.147484e+09, 2.147484e+09, ..., 2.147484e+09, 2.147484e+09, 2.147484e+09])
data.cf
Coordinates: - CF Axes: X: ['longitude'] Y: ['latitude'] T: ['time'] Z: n/a - CF Coordinates: longitude: ['longitude'] latitude: ['latitude'] time: ['time'] vertical: n/a - Cell Measures: area, volume: n/a - Standard Names: latitude: ['latitude'] longitude: ['longitude'] time: ['time'] - Bounds: n/a Data Variables: - Cell Measures: area, volume: n/a - Standard Names: aggregate_quality_flag: ['sea_surface_height_above_sea_level_geoid_mhhw_qc_agg'] altitude: ['z'] sea_surface_height_above_sea_level: ['sea_surface_height_above_sea_level_geoid_mhhw'] sea_surface_height_above_sea_level quality_flag: ['sea_surface_height_above_sea_level_geoid_mhhw_qc_tests'] - Bounds: n/a
data.cf.axes
{'X': ['longitude'], 'Y': ['latitude'], 'T': ['time']}
data.cf.coordinates
{'longitude': ['longitude'], 'latitude': ['latitude'], 'time': ['time']}
data.cf.standard_names
{'latitude': ['latitude'], 'longitude': ['longitude'], 'time': ['time'], 'altitude': ['z'], 'sea_surface_height_above_sea_level': ['sea_surface_height_above_sea_level_geoid_mhhw'], 'aggregate_quality_flag': ['sea_surface_height_above_sea_level_geoid_mhhw_qc_agg'], 'sea_surface_height_above_sea_level quality_flag': ['sea_surface_height_above_sea_level_geoid_mhhw_qc_tests']}
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(15, 3.75))
data.cf.plot.scatter('time','sea_surface_height_above_sea_level', ax=ax, s=5)
ax.grid(True)
Below we create a simple Quality Assurance/Quality Control (QA/QC) configuration that will be used as input for ioos_qc
. All the interval values are in the same units as the data.
For more information on the tests and recommended values for QA/QC check the documentation of each test and its inputs: https://ioos.github.io/ioos_qc/api/ioos_qc.html#module-ioos_qc.qartod
qc_config = {
"qartod": {
"gross_range_test": {
"suspect_span": [ -2, 3],
"fail_span": [-10, 10]
},
"flat_line_test": {
"tolerance": 0.001,
"suspect_threshold": 10800,
"fail_threshold": 21600
},
"spike_test": {
"suspect_threshold": 0.8,
"fail_threshold": 3,
}
}
}
nice_print(qc_config)
{ "qartod": { "gross_range_test": { "suspect_span": [ -2, 3 ], "fail_span": [ -10, 10 ] }, "flat_line_test": { "tolerance": 0.001, "suspect_threshold": 10800, "fail_threshold": 21600 }, "spike_test": { "suspect_threshold": 0.8, "fail_threshold": 3 } } }
from ioos_qc.config import QcConfig
qc = QcConfig(qc_config)
variable_name = data.cf.standard_names["sea_surface_height_above_sea_level"][0]
qc_results = qc.run(
inp=data[variable_name],
tinp=data.cf["time"],
)
nice_print(qc_results['qartod'])
{ "gross_range_test": "[1 1 1 ... 1 1 1]", "flat_line_test": "[1 1 1 ... 1 1 1]", "spike_test": "[2 1 1 ... 1 1 2]" }
The results are returned in a dictionary format, similar to the input configuration, with a mask for each test. The results range from 1 to 4 meaning:
flag | meaning |
---|---|
1 | data passed the QA/QC |
2 | did not run on this data point |
3 | flag as suspect |
4 | flag as failed |
gross_range
test results¶The gross range test test should fail data outside the $\pm$ 10 range and suspect data below -2, and greater than 3. As one can easily see all the major spikes are flagged as expected.
plot_results(
data,
variable_name,
qc_results,
title,
"gross_range_test"
)
qc_config['qartod']['gross_range_test']
{'suspect_span': [-2, 3], 'fail_span': [-10, 10]}
spike
test results¶An actual spike test, based on a data increase threshold, flags similar spikes to the gross range test but also indetifies other suspect unusual increases in the series.
plot_results(
data,
variable_name,
qc_results,
title,
"spike_test"
)
qc_config['qartod']['spike_test']
{'suspect_threshold': 0.8, 'fail_threshold': 3}
flat_line
test results¶The flat line test identifies issues with the data where values are "stuck."
ioos_qc
succefully identified a huge portion of the data where that happens and flagged a smaller one as suspect. (Zoom in the red point to the left to see this one.)
plot_results(
data,
variable_name,
qc_results,
title,
"flat_line_test"
)
qc_config['qartod']['flat_line_test']
{'tolerance': 0.001, 'suspect_threshold': 10800, 'fail_threshold': 21600}
ioos_qc
for QARTOD?¶import ioos_qc
for func in dir(ioos_qc.qartod):
if "test" in func:
print(func)
attenuated_signal_test climatology_test density_inversion_test flat_line_test gross_range_test location_test rate_of_change_test spike_test
Mathew Biddle
Mathew.Biddle@noaa.gov
https://orcid.org/0000-0003-4897-1669
NOAA/NOS/IOOS
Materials available at https://github.com/MathewBiddle/WIO_workshop.