Skip to article frontmatterSkip to article content

Compute Demo: Use Rooki to access CMIP6 data

Overview

Rooki is a Python client to interact with Rook data subsetting service for climate model data. This service is used in the backend by the European Copernicus Climate Data Store to access the CMIP6 data pool. The Rook service is deployed for load-balancing at IPSL (Paris) and DKRZ (Hamburg). The CMIP6 data pool is shared with ESGF. The provided CMIP6 subset for Copernicus is synchronized at both sites.

Rook provides operators for subsetting, averaging and regridding to retrieve a subset of the CMIP6 data pool. These operators are implemented by the clisops Python libray and are based on xarray. The clisops library is developed by Ouranos (Canada), CEDA (UK) and DKRZ (Germany).

The operators can be called remotly using the OGC Web Processing Service (WPS) standard.

rook 4 cds

ROOK: Remote Operations On Klimadaten

Prerequisites

ConceptsImportanceNotes
Intro to XarrayNecessary
Understanding of NetCDFHelpfulFamiliarity with metadata structure
Knowing OGC servicesHelpfulUnderstanding of the service interfaces
  • Time to learn: 15 minutes

Init Rooki

import os

# Configuration line to set the wps node - in this case, use DKRZ in Germany
os.environ['ROOK_URL'] = 'http://rook.dkrz.de/wps'

from rooki import rooki

Retrieve subset of CMIP6 data

The CMIP6 dataset is identified by a dataset-id. An intake catalog as available to lookup the available datasets:

https://nbviewer.org/github/roocs/rooki/blob/master/notebooks/demo/demo-intake-catalog.ipynb

resp = rooki.subset(
    collection='c3s-cmip6.CMIP.MPI-M.MPI-ESM1-2-HR.historical.r1i1p1f1.Amon.tas.gn.v20190710',
    time='2000-01-01/2000-01-31',
    area='-30,-40,70,80',
)
resp.ok
True

Open Dataset with xarray

ds = resp.datasets()[0]
ds
Loading...

Plot CMIP6 Dataset

ds.tas.isel(time=0).plot()
<Figure size 640x480 with 2 Axes>

Show Provenance

A provenance document is generated remotely to document the operation steps. The provenance uses the W3C PROV standard.

from IPython.display import Image
Image(resp.provenance_image())
<IPython.core.display.Image object>

Run workflow with subset and average operator

Instead of running a single operator one can also chain several operators in a workflow.

Use rooki operators to create a workflow

from rooki import operators as ops

Define the workflow

... internally the workflow tree is a json document

tas = ops.Input(
    'tas', ['c3s-cmip6.CMIP.MPI-M.MPI-ESM1-2-HR.historical.r1i1p1f1.Amon.tas.gn.v20190710']
)

wf = ops.Subset(
    tas, 
    time="2000/2000",
    time_components="month:jan,feb,mar",
    area='-30,-40,70,80',  
)

wf = ops.WeightedAverage(wf)

Optional: look at the workflow json document

... only to give some insight

import json
print(json.dumps(wf._tree(), indent=4))
{
    "inputs": {
        "tas": [
            "c3s-cmip6.CMIP.MPI-M.MPI-ESM1-2-HR.historical.r1i1p1f1.Amon.tas.gn.v20190710"
        ]
    },
    "steps": {
        "subset_tas_1": {
            "run": "subset",
            "in": {
                "collection": "inputs/tas",
                "time": "2000/2000",
                "time_components": "month:jan,feb,mar",
                "area": "-30,-40,70,80"
            }
        },
        "weighted_average_tas_1": {
            "run": "weighted_average",
            "in": {
                "collection": "subset_tas_1/output"
            }
        }
    },
    "outputs": {
        "output": "weighted_average_tas_1/output"
    }
}

Submit workflow job

resp = wf.orchestrate()
resp.ok
True

Open as xarray dataset

ds = resp.datasets()[0]
ds
Loading...

Plot dataset

ds.tas.plot()
<Figure size 640x480 with 2 Axes>

Show provenance

Image(resp.provenance_image())
<IPython.core.display.Image object>

Summary

In this notebook, we used the Rooki Python client to retrieve a subset of a CMIP6 dataset. The operations are executed remotely on a Rook subsetting service (using OGC API and xarray/clisops). The dataset is plotted and a provenance document is shown. We also showed that remote operators can be chained to be executed in a single workflow operation.

What’s next?

This service is used by the European Copernicus Climate Data Store.

We need to figure out how this service can be used in the new ESGF:

  • where will it be deployed?
  • how can it be integrated in the ESGF search (STAC catalogs, ...)
  • ???

Resources and references