Intake Rooki Demo


Using intake-esgf with rooki

Here we dig into using intake-esgf to search for data, then rooki to do server-side computing!


Overview

If you have an introductory paragraph, lead with it here! Keep it short and tied to your material, then be sure to continue into the required list of topics below,

  1. Search and find data using intake-esgf, returning the dataset ids

  2. Feed the dataset ids to rooki to subset and average the data remotely

  3. Visualize the results on the end-user side

Prerequisites

Concepts

Importance

Notes

Intro to Intake-ESGF

Necessary

How to configure a search and use output

Intro to Rooki

Helpful

How to initialize and run rooki

Intro to hvPlot

Necessary

How to plot interactive visualizations

  • Time to learn: 30 minutes


Imports

import os

# Set the rooki client to use the oak ridge national lab (ORNL) WPS deployment
url = 'https://esgf-wps.apps.onyx.ccs.ornl.gov/wps'
os.environ['ROOK_URL'] = url

from rooki import rooki
from rooki import operators as ops
import intake_esgf
from intake_esgf import ESGFCatalog
import xarray as xr
import hvplot.xarray
import holoviews as hv
import panel as pn
hv.extension("bokeh")

Search and Find Data for Surface Temperature on DKRZ Node

Let’s start with refining which index we would like to search from. For this analysis, we are remotely computing on the ORNL node since this is where rooki is running. We know this from checking the ._url method of rooki!

rooki._url
'https://esgf-wps.apps.onyx.ccs.ornl.gov/wps'

Extract the Dataset ID and Pass to Rooki

Now that we have set of datasets, we need to extract the dataset_id, which is the unique identifier for the dataset. We can pull this from the id column from intake-esgf

Separate the Dataset ID

cat.df.id.values[0]
['CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r1i1p1f1.Amon.tas.gn.v20191120|esgf-data04.diasjp.net',
 'CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r1i1p1f1.Amon.tas.gn.v20191120|dpesgf03.nccs.nasa.gov',
 'CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r1i1p1f1.Amon.tas.gn.v20191120|esgf-node.ornl.gov']

Notice how the node information is added onto end of the file id. We need to “chop off” that last bit, leaving everything before the | character. We put this into a function to make it easier to generalize and apply.

def separate_dataset_id(full_dataset):
    """
    Create the path to the data, adding an additional css03_data to indicate the CMIP6 archive at ORNL
    """
    return f"css03_data.{full_dataset[0].split('|')[0]}"

separate_dataset_id(cat.df.id.values[0])
'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r1i1p1f1.Amon.tas.gn.v20191120'

Now, we can apply this to the entire list within our dataframe using the following

dsets = [separate_dataset_id(dataset) for dataset in list(cat.df.id.values)]
dsets
['css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r1i1p1f1.Amon.tas.gn.v20191120',
 'css03_data.CMIP6.CMIP.CMCC.CMCC-ESM2.historical.r1i1p1f1.Amon.tas.gn.v20210114',
 'css03_data.CMIP6.CMIP.NCAR.CESM2-WACCM.historical.r1i1p1f1.Amon.tas.gn.v20190227',
 'css03_data.CMIP6.CMIP.CMCC.CMCC-CM2-SR5.historical.r1i1p1f1.Amon.tas.gn.v20200616',
 'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-1-G.historical.r1i1p1f1.Amon.tas.gn.v20180827',
 'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-2-G.historical.r1i1p1f1.Amon.tas.gn.v20191120',
 'css03_data.CMIP6.CMIP.NCAR.CESM2-WACCM-FV2.historical.r1i1p1f1.Amon.tas.gn.v20191120',
 'css03_data.CMIP6.CMIP.NCAR.CESM2-FV2.historical.r1i1p1f1.Amon.tas.gn.v20191120',
 'css03_data.CMIP6.CMIP.CMCC.CMCC-CM2-HR4.historical.r1i1p1f1.Amon.tas.gn.v20200904',
 'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-1-H.historical.r1i1p1f1.Amon.tas.gn.v20190403',
 'css03_data.CMIP6.CMIP.NCAR.CESM2.historical.r1i1p1f1.Amon.tas.gn.v20190308',
 'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-1-G-CC.historical.r1i1p1f1.Amon.tas.gn.v20190815',
 'css03_data.CMIP6.CMIP.MIROC.MIROC6.historical.r1i1p1f1.Amon.tas.gn.v20181212']

Compute with Rooki

Now that we have a list of IDs to pass to rooki, let’s compute!

In this case, we are:

  • Subsetting from the year 1900 to 2000

  • Subsetting near India using the bounds 65,0,100,35

  • Computing the yealy average

We then check to make sure the response is okay, and if it is, return that to the user!

def compute_annual_mean_subset(dset_id):
    # Subset by area then time
    wf = ops.AverageByTime(
        ops.Subset(
            ops.Input(
                'tas', [dsets[0]]
            ),
            time='1900-01-01/2000-12-31',
            area='65,0,100,35',
        ),
    freq="year"
    )
    
    resp = wf.orchestrate()
    
    if resp.ok:
        ds = resp.datasets()[0]
    else:
        ds = xr.Dataset()
    return ds
compute_annual_mean_subset(dsets[0])
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_p9cwdemi/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
<xarray.Dataset> Size: 156kB
Dimensions:    (lat: 18, time: 101, bnds: 2, lon: 14)
Coordinates:
  * lat        (lat) float64 144B 1.0 3.0 5.0 7.0 9.0 ... 29.0 31.0 33.0 35.0
  * lon        (lon) float64 112B 66.25 68.75 71.25 73.75 ... 93.75 96.25 98.75
    height     float64 8B ...
  * time       (time) object 808B 1900-01-01 00:00:00 ... 2000-01-01 00:00:00
Dimensions without coordinates: bnds
Data variables:
    lat_bnds   (time, lat, bnds) float64 29kB ...
    lon_bnds   (time, lon, bnds) float64 23kB ...
    tas        (time, lat, lon) float32 102kB ...
    time_bnds  (time, bnds) object 2kB ...
Attributes: (12/48)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            CMIP
    branch_method:          standard
    branch_time_in_child:   0.0
    branch_time_in_parent:  0.0
    contact:                Kenneth Lo (cdkkl@giss.nasa.gov)
    ...                     ...
    title:                  GISS-E2-2-H output prepared for CMIP6
    tracking_id:            hdl:21.14100/09d7bd73-f74e-4f9a-a14b-205fa5078217
    variable_id:            tas
    variant_label:          r1i1p1f1
    license:                CMIP6 model data produced by NASA Goddard Institu...
    cmor_version:           3.3.2

Now that it works with a single dataset, let’s do this for all the datasets and put them into a dictionary with the dataset ids as the keys.

dset_dict = {}
for dset in dsets:
    dset_dict[dset] = compute_annual_mean_subset(dset)
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_4tki31ka/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_r2f02hvl/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_lu8c8qa1/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_uguvw4ac/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_277dxih2/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_8orbkvmb/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_nzhf2ws7/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_lv3psy2l/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_9nop9w3l/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_1eb7837q/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_g_5ggv2j/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_2s_1j9tu/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_sb406lac/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.

Visualize the Output

Let’s use hvPlot to visualize. The datasets are stored in a dictionary of datasets, we need to:

  • Extract a single key

  • Plot a contour filled visualization, with some geographic features

dset_dict[list(dset_dict.keys())[-1]].tas.hvplot.contourf(x='lon',
                                                          y='lat',
                                                          cmap='Reds',
                                                          levels=20,
                                                          clim=(250, 320),
                                                          features=["land", "ocean"],
                                                          alpha=0.7,
                                                          widget_location='bottom',
                                                          clabel="Yearly Average Temperature (K)",
                                                          geo=True)

Summary

Within this notebook, we learned how to specify a specific index node to search from, pass discovered datasets to rooki, and chain remote-compute with several operations using rooki. We then visualized the output using hvPlot, leading to an interactive plot!

What’s next?

More adaptations of the intake-esgf + rooki to remotely compute on ESGF data.