Compute Demo: Use Rooki to access CMIP6 data

Overview

Rooki is a Python client to interact with Rook data subsetting service for climate model data. This service is used in the backend by the European Copernicus Climate Data Store to access the CMIP6 data pool. The Rook service is deployed for load-balancing at IPSL (Paris) and DKRZ (Hamburg). The CMIP6 data pool is shared with ESGF. The provided CMIP6 subset for Copernicus is synchronized at both sites.

Rook provides operators for subsetting, averaging and regridding to retrieve a subset of the CMIP6 data pool. These operators are implemented by the clisops Python libray and are based on xarray. The clisops library is developed by Ouranos (Canada), CEDA (UK) and DKRZ (Germany).

The operators can be called remotly using the OGC Web Processing Service (WPS) standard.

rook 4 cds

ROOK: Remote Operations On Klimadaten

  • Rook: https://github.com/roocs/rook

  • Rooki: https://github.com/roocs/rooki

  • Clisops: https://github.com/roocs/clisops

  • Rook Presentation: https://github.com/cehbrecht/talk-rook-status-kickoff-meeting-2022/blob/main/Rook_C3S2_380_2022-02-11.pdf

Prerequisites

Concepts

Importance

Notes

Intro to Xarray

Necessary

Understanding of NetCDF

Helpful

Familiarity with metadata structure

Knowing OGC services

Helpful

Understanding of the service interfaces

  • Time to learn: 15 minutes

Init Rooki

import os

# Configuration line to set the wps node - in this case, use DKRZ in Germany
os.environ['ROOK_URL'] = 'http://rook.dkrz.de/wps'

from rooki import rooki

Retrieve subset of CMIP6 data

The CMIP6 dataset is identified by a dataset-id. An intake catalog as available to lookup the available datasets:

https://nbviewer.org/github/roocs/rooki/blob/master/notebooks/demo/demo-intake-catalog.ipynb

resp = rooki.subset(
    collection='c3s-cmip6.CMIP.MPI-M.MPI-ESM1-2-HR.historical.r1i1p1f1.Amon.tas.gn.v20190710',
    time='2000-01-01/2000-01-31',
    area='-30,-40,70,80',
)
resp.ok
True

Open Dataset with xarray

ds = resp.datasets()[0]
ds
Downloading to /tmp/metalink_d970nb6c/tas_Amon_MPI-ESM1-2-HR_historical_r1i1p1f1_gn_20000116-20000116.nc.
<xarray.Dataset> Size: 61kB
Dimensions:    (time: 1, bnds: 2, lat: 129, lon: 107)
Coordinates:
  * time       (time) datetime64[ns] 8B 2000-01-16T12:00:00
  * lat        (lat) float64 1kB -39.74 -38.81 -37.87 ... 78.08 79.01 79.95
  * lon        (lon) float64 856B -30.0 -29.06 -28.12 ... 67.5 68.44 69.38
    height     float64 8B ...
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 16B ...
    lat_bnds   (lat, bnds) float64 2kB ...
    lon_bnds   (lon, bnds) float64 2kB ...
    tas        (time, lat, lon) float32 55kB ...
Attributes: (12/47)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            CMIP
    branch_method:          standard
    branch_time_in_child:   0.0
    branch_time_in_parent:  0.0
    contact:                cmip6-mpi-esm@dkrz.de
    ...                     ...
    title:                  MPI-ESM1-2-HR output prepared for CMIP6
    variable_id:            tas
    variant_label:          r1i1p1f1
    license:                CMIP6 model data produced by MPI-M is licensed un...
    cmor_version:           3.5.0
    tracking_id:            hdl:21.14100/af75dd9f-d9c2-4e0e-a294-2bb0d5b740cf

Plot CMIP6 Dataset

ds.tas.isel(time=0).plot()
<matplotlib.collections.QuadMesh at 0x7f647ad1c400>
../_images/2bc12149c77fa3234456e10a7aef3ece7e88386d04c0672f74109321a87705b7.png

Show Provenance

A provenance document is generated remotely to document the operation steps. The provenance uses the W3C PROV standard.

from IPython.display import Image
Image(resp.provenance_image())
../_images/02704485c936710ccd7d39632ca0e799c3fc947a2200a2951aba9dfe3cfd6897.png

Run workflow with subset and average operator

Instead of running a single operator one can also chain several operators in a workflow.

Use rooki operators to create a workflow

from rooki import operators as ops

Define the workflow

… internally the workflow tree is a json document

tas = ops.Input(
    'tas', ['c3s-cmip6.CMIP.MPI-M.MPI-ESM1-2-HR.historical.r1i1p1f1.Amon.tas.gn.v20190710']
)

wf = ops.Subset(
    tas, 
    time="2000/2000",
    time_components="month:jan,feb,mar",
    area='-30,-40,70,80',  
)

wf = ops.WeightedAverage(wf)

Optional: look at the workflow json document

only to give some insight

import json
print(json.dumps(wf._tree(), indent=4))
{
    "inputs": {
        "tas": [
            "c3s-cmip6.CMIP.MPI-M.MPI-ESM1-2-HR.historical.r1i1p1f1.Amon.tas.gn.v20190710"
        ]
    },
    "steps": {
        "subset_tas_1": {
            "run": "subset",
            "in": {
                "collection": "inputs/tas",
                "time": "2000/2000",
                "time_components": "month:jan,feb,mar",
                "area": "-30,-40,70,80"
            }
        },
        "weighted_average_tas_1": {
            "run": "weighted_average",
            "in": {
                "collection": "subset_tas_1/output"
            }
        }
    },
    "outputs": {
        "output": "weighted_average_tas_1/output"
    }
}

Submit workflow job

resp = wf.orchestrate()
resp.ok
True

Open as xarray dataset

ds = resp.datasets()[0]
ds
Downloading to /tmp/metalink_jau7yne9/tas_Amon_MPI-ESM1-2-HR_historical_r1i1p1f1_gn_20000116-20000316_w-avg.nc.
<xarray.Dataset> Size: 88B
Dimensions:   (bnds: 2, time: 3)
Coordinates:
    height    float64 8B ...
  * time      (time) datetime64[ns] 24B 2000-01-16T12:00:00 ... 2000-03-16T12...
Dimensions without coordinates: bnds
Data variables:
    lat_bnds  (bnds) float64 16B ...
    lon_bnds  (bnds) float64 16B ...
    tas       (time) float64 24B ...
Attributes: (12/47)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            CMIP
    branch_method:          standard
    branch_time_in_child:   0.0
    branch_time_in_parent:  0.0
    contact:                cmip6-mpi-esm@dkrz.de
    ...                     ...
    title:                  MPI-ESM1-2-HR output prepared for CMIP6
    variable_id:            tas
    variant_label:          r1i1p1f1
    license:                CMIP6 model data produced by MPI-M is licensed un...
    cmor_version:           3.5.0
    tracking_id:            hdl:21.14100/af75dd9f-d9c2-4e0e-a294-2bb0d5b740cf

Plot dataset

ds.tas.plot()
[<matplotlib.lines.Line2D at 0x7f6472a20f10>]
../_images/971a2ad4b7958692e5d5e956d5dccbda7c586356c248837053b533b0b134972f.png

Show provenance

Image(resp.provenance_image())
../_images/b562751567a32dd354c6857120d2d377272a3e9943a06ae32ea3853c8faa36fc.png

Summary

In this notebook, we used the Rooki Python client to retrieve a subset of a CMIP6 dataset. The operations are executed remotely on a Rook subsetting service (using OGC API and xarray/clisops). The dataset is plotted and a provenance document is shown. We also showed that remote operators can be chained to be executed in a single workflow operation.

What’s next?

This service is used by the European Copernicus Climate Data Store.

We need to figure out how this service can be used in the new ESGF:

  • where will it be deployed?

  • how can it be integrated in the ESGF search (STAC catalogs, …)

  • ???