Overview¶
Rooki is a Python client to interact with Rook data subsetting service for climate model data. This service is used in the backend by the European Copernicus Climate Data Store to access the CMIP6 data pool. The Rook service is deployed for load-balancing at IPSL (Paris) and DKRZ (Hamburg). The CMIP6 data pool is shared with ESGF. The provided CMIP6 subset for Copernicus is synchronized at both sites.
Rook provides operators for subsetting, averaging and regridding to retrieve a subset of the CMIP6 data pool. These operators are implemented by the clisops Python libray and are based on xarray. The clisops library is developed by Ouranos (Canada), CEDA (UK) and DKRZ (Germany).
The operators can be called remotly using the OGC Web Processing Service (WPS) standard.

ROOK: Remote Operations On Klimadaten
- Rook: https://
github .com /roocs /rook - Rooki: https://
github .com /roocs /rooki - Clisops: https://
github .com /roocs /clisops - Rook Presentation: Rook
_C3S2 _380 _2022 -02 -11 .pdf
Prerequisites¶
Concepts | Importance | Notes |
---|---|---|
Intro to Xarray | Necessary | |
Understanding of NetCDF | Helpful | Familiarity with metadata structure |
Knowing OGC services | Helpful | Understanding of the service interfaces |
- Time to learn: 15 minutes
Init Rooki¶
import os
# Configuration line to set the wps node - in this case, use DKRZ in Germany
os.environ['ROOK_URL'] = 'http://rook.dkrz.de/wps'
from rooki import rooki
Retrieve subset of CMIP6 data¶
The CMIP6 dataset is identified by a dataset-id. An intake catalog as available to lookup the available datasets:
https://
resp = rooki.subset(
collection='c3s-cmip6.CMIP.MPI-M.MPI-ESM1-2-HR.historical.r1i1p1f1.Amon.tas.gn.v20190710',
time='2000-01-01/2000-01-31',
area='-30,-40,70,80',
)
resp.ok
True
Open Dataset with xarray¶
ds = resp.datasets()[0]
ds
Plot CMIP6 Dataset¶
ds.tas.isel(time=0).plot()

Show Provenance¶
A provenance document is generated remotely to document the operation steps. The provenance uses the W3C PROV standard.
from IPython.display import Image
Image(resp.provenance_image())

Run workflow with subset and average operator¶
Instead of running a single operator one can also chain several operators in a workflow.
Use rooki operators to create a workflow¶
from rooki import operators as ops
Define the workflow¶
... internally the workflow tree is a json document
tas = ops.Input(
'tas', ['c3s-cmip6.CMIP.MPI-M.MPI-ESM1-2-HR.historical.r1i1p1f1.Amon.tas.gn.v20190710']
)
wf = ops.Subset(
tas,
time="2000/2000",
time_components="month:jan,feb,mar",
area='-30,-40,70,80',
)
wf = ops.WeightedAverage(wf)
Optional: look at the workflow json document¶
... only to give some insight
import json
print(json.dumps(wf._tree(), indent=4))
{
"inputs": {
"tas": [
"c3s-cmip6.CMIP.MPI-M.MPI-ESM1-2-HR.historical.r1i1p1f1.Amon.tas.gn.v20190710"
]
},
"steps": {
"subset_tas_1": {
"run": "subset",
"in": {
"collection": "inputs/tas",
"time": "2000/2000",
"time_components": "month:jan,feb,mar",
"area": "-30,-40,70,80"
}
},
"weighted_average_tas_1": {
"run": "weighted_average",
"in": {
"collection": "subset_tas_1/output"
}
}
},
"outputs": {
"output": "weighted_average_tas_1/output"
}
}
Submit workflow job¶
resp = wf.orchestrate()
resp.ok
True
Open as xarray dataset¶
ds = resp.datasets()[0]
ds
Plot dataset¶
ds.tas.plot()

Show provenance¶
Image(resp.provenance_image())

Summary¶
In this notebook, we used the Rooki Python client to retrieve a subset of a CMIP6 dataset. The operations are executed remotely on a Rook subsetting service (using OGC API and xarray/clisops). The dataset is plotted and a provenance document is shown. We also showed that remote operators can be chained to be executed in a single workflow operation.
What’s next?¶
This service is used by the European Copernicus Climate Data Store.
We need to figure out how this service can be used in the new ESGF:
- where will it be deployed?
- how can it be integrated in the ESGF search (STAC catalogs, ...)
- ???