Intake Rooki Demo


Using intake-esgf with rooki

Overview

In this notebook we will demonstrate how to use intake-esgf and rooki to perform server-side operations and return the result to the user. This will occur in several steps.

  1. We use intake-esgf to find data which is local to the ORNL server and then form an id which rooki uses to load the data remotely.

  2. We build a rooki workflow which uses these ids (rooki_id) to subset and average the data remotely.

  3. The results are downloaded locally and we visualize them interactively using hvplot.

Prerequisites

Concepts

Importance

Notes

Intro to Intake-ESGF

Necessary

How to configure a search and use output

Intro to Rooki

Helpful

How to initialize and run rooki

Intro to hvPlot

Necessary

How to plot interactive visualizations

  • Time to learn: 30 minutes


Imports

Before importing rooki, we need to set an environment variable that will signal the rooki client to use the web processing service (WPS) deployment located at Oak Ridge National Lab (ORNL).

import os

# Configuration line to set the wps node - in this case, use ORNL in the USA
url = "https://esgf-node.ornl.gov/wps"
os.environ["ROOK_URL"] = url

from rooki import operators as ops
from rooki import rooki
# Other imports
import holoviews as hv
import hvplot.xarray
import intake_esgf
import matplotlib.pyplot as plt
import panel as pn
import xarray as xr
from intake_esgf import ESGFCatalog

hv.extension("bokeh")

Search and Find Data for Surface Temperature on the ORNL Node

Let’s start with refining which index we would like to search from. For this analysis, we are remotely computing on the ORNL node since this is where rooki is running. We know this from checking the ._url method of rooki!

rooki._url
'https://esgf-node.ornl.gov/wps'

Extract IDs to Pass to Rooki

The catalog returns a lot of information about the datasets that were found, but the rooki WPS interface just needs an ID that looks similar to what we find in the id column of the dataframe. We need to remove the |esgf-node.ornl.gov on the end and prepend a ccs03_data. To do this we will write a function and apply it to the dataframe.

def build_rooki_id(id_list):
    rooki_id = id_list[0]
    rooki_id = rooki_id.split("|")[0]
    rooki_id = f"css03_data.{rooki_id}"  # <-- just something you have to know for now :(
    return rooki_id

rooki_ids = cat.df.id.apply(build_rooki_id).to_list()
rooki_ids
['css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r1i1p1f1.Amon.tas.gn.v20191120',
 'css03_data.CMIP6.CMIP.CMCC.CMCC-ESM2.historical.r1i1p1f1.Amon.tas.gn.v20210114',
 'css03_data.CMIP6.CMIP.NCAR.CESM2-WACCM.historical.r1i1p1f1.Amon.tas.gn.v20190227',
 'css03_data.CMIP6.CMIP.CMCC.CMCC-CM2-SR5.historical.r1i1p1f1.Amon.tas.gn.v20200616',
 'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-1-G.historical.r1i1p1f1.Amon.tas.gn.v20180827',
 'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-2-G.historical.r1i1p1f1.Amon.tas.gn.v20191120',
 'css03_data.CMIP6.CMIP.NCAR.CESM2-WACCM-FV2.historical.r1i1p1f1.Amon.tas.gn.v20191120',
 'css03_data.CMIP6.CMIP.NCAR.CESM2-FV2.historical.r1i1p1f1.Amon.tas.gn.v20191120',
 'css03_data.CMIP6.CMIP.CMCC.CMCC-CM2-HR4.historical.r1i1p1f1.Amon.tas.gn.v20200904',
 'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-1-H.historical.r1i1p1f1.Amon.tas.gn.v20190403',
 'css03_data.CMIP6.CMIP.NCAR.CESM2.historical.r1i1p1f1.Amon.tas.gn.v20190308',
 'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-1-G-CC.historical.r1i1p1f1.Amon.tas.gn.v20190815',
 'css03_data.CMIP6.CMIP.MIROC.MIROC6.historical.r1i1p1f1.Amon.tas.gn.v20181212']

Compute with Rooki

Now that we have a list of IDs to pass to rooki, let’s compute! In our case we are interested in the annual temperature from 1990-2000 over an area that includes India (latitude from 0 to 35, longitude from 65 to 100). The following function will construct a rooki workflow that uses operators (functions in the ops namespace) that rooki uses to:

  • read in data (ops.Input)

  • subset in time and space (ops.Subset), and

  • average in time (ops.AverageByTime) on a yearly frequency.

We then check to make sure the response is okay, and if it is, return the processed dataset to the user! If something went wrong, the function will raise an error and show you the message that rooki sent back.

def india_annual_temperature(rooki_id):
    workflow = ops.AverageByTime(
        ops.Subset(
            ops.Input("tas", [rooki_id]),
            time="1990-01-01/2000-01-01",
            area="65,0,100,35",
        ),
        freq="year",
    )
    response = workflow.orchestrate()
    if not response.ok:
        raise ValueError(response)
    return response.datasets()[0]

Now let’s test a single rooki_id to demonstrate successful functionality. The rooki_id let’s the WPS know on which dataset we are intersted in operating and then the data is loaded remotely, subset, and then averaged. After this computation is finished on the server, the result is transferred to you and loaded into a xarray dataset. Inspect the dataset header to see that there are 10 times, one for each year and the latitude and longitude range spans our input values.

india_annual_temperature(rooki_ids[0])
Metalink content-type detected.
Downloading to /tmp/metalink_0l3no9bu/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
<xarray.Dataset> Size: 16kB
Dimensions:    (lat: 18, time: 10, bnds: 2, lon: 14)
Coordinates:
  * lat        (lat) float64 144B 1.0 3.0 5.0 7.0 9.0 ... 29.0 31.0 33.0 35.0
  * lon        (lon) float64 112B 66.25 68.75 71.25 73.75 ... 93.75 96.25 98.75
    height     float64 8B ...
  * time       (time) object 80B 1990-01-01 00:00:00 ... 1999-01-01 00:00:00
Dimensions without coordinates: bnds
Data variables:
    lat_bnds   (time, lat, bnds) float64 3kB ...
    lon_bnds   (time, lon, bnds) float64 2kB ...
    tas        (time, lat, lon) float32 10kB ...
    time_bnds  (time, bnds) object 160B ...
Attributes: (12/48)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            CMIP
    branch_method:          standard
    branch_time_in_child:   0.0
    branch_time_in_parent:  0.0
    contact:                Kenneth Lo (cdkkl@giss.nasa.gov)
    ...                     ...
    title:                  GISS-E2-2-H output prepared for CMIP6
    tracking_id:            hdl:21.14100/503cf427-12d4-4e54-a431-b9843112f320
    variable_id:            tas
    variant_label:          r1i1p1f1
    license:                CMIP6 model data produced by NASA Goddard Institu...
    cmor_version:           3.3.2

Now that we have some confidence in our workflow function, we can iterate over rooki_id’s running for each and saving into a dictionary whose keys are the different models. You should see messages print to the screen which inform you where the temporary output is being downloaded. This location can be configured in rooki, but for now we will just load them into datasets.

dsd = {
    rooki_id.split(".")[4]: india_annual_temperature(rooki_id)
    for rooki_id in rooki_ids
}
Metalink content-type detected.
Downloading to /tmp/metalink_pgo05w04/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_diw1beph/tas_Amon_CMCC-ESM2_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_d3ubd29e/tas_Amon_CESM2-WACCM_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_503o8fu0/tas_Amon_CMCC-CM2-SR5_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_nwqb_szw/tas_Amon_GISS-E2-1-G_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_9i6xq_ba/tas_Amon_GISS-E2-2-G_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_jnfix4pf/tas_Amon_CESM2-WACCM-FV2_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_tinmmmt9/tas_Amon_CESM2-FV2_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_2v8f_yl6/tas_Amon_CMCC-CM2-HR4_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_l82e8hgt/tas_Amon_GISS-E2-1-H_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_jki9k3a7/tas_Amon_CESM2_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_p8njukg7/tas_Amon_GISS-E2-1-G-CC_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_286phjqc/tas_Amon_MIROC6_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.

Visualize the Output

Let’s use hvPlot to visualize. The datasets are stored in a dictionary of datasets, we need to:

  • Extract a single key

  • Plot a contour filled visualization, with some geographic features

tas = dsd["MIROC6"].tas
tas.hvplot.contourf(
    x="lon",
    y="lat",
    cmap="Reds",
    levels=20,
    clim=(250, 320),
    features=["land", "ocean"],
    alpha=0.7,
    widget_location="bottom",
    clabel="Yearly Average Temperature (K)",
    geo=True,
)
/home/runner/miniconda3/envs/cookbook-dev/lib/python3.10/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/110m_physical/ne_110m_land.zip
  warnings.warn(f'Downloading: {url}', DownloadWarning)
/home/runner/miniconda3/envs/cookbook-dev/lib/python3.10/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/110m_physical/ne_110m_ocean.zip
  warnings.warn(f'Downloading: {url}', DownloadWarning)

Summary

Within this notebook, we learned how to specify a specific index node to search from, pass discovered datasets to rooki, and chain remote-compute with several operations using rooki. We then visualized the output using hvPlot, leading to an interactive plot!

What’s next?

More adaptations of the intake-esgf + rooki to remotely compute on ESGF data.