Using intake-esgf with rooki
Overview
In this notebook we will demonstrate how to use intake-esgf and rooki to perform server-side operations and return the result to the user. This will occur in several steps.
We use intake-esgf to find data which is local to the ORNL server and then form an id which rooki uses to load the data remotely.
We build a rooki workflow which uses these ids (
rooki_id
) to subset and average the data remotely.The results are downloaded locally and we visualize them interactively using hvplot.
Prerequisites
Concepts |
Importance |
Notes |
---|---|---|
Necessary |
How to configure a search and use output |
|
Helpful |
How to initialize and run rooki |
|
Necessary |
How to plot interactive visualizations |
Time to learn: 30 minutes
Imports
Before importing rooki, we need to set an environment variable that will signal the rooki client to use the web processing service (WPS) deployment located at Oak Ridge National Lab (ORNL).
import os
# Configuration line to set the wps node - in this case, use ORNL in the USA
url = "https://esgf-node.ornl.gov/wps"
os.environ["ROOK_URL"] = url
from rooki import operators as ops
from rooki import rooki
# Other imports
import holoviews as hv
import hvplot.xarray
import intake_esgf
import matplotlib.pyplot as plt
import panel as pn
import xarray as xr
from intake_esgf import ESGFCatalog
hv.extension("bokeh")
Search and Find Data for Surface Temperature on the ORNL Node
Let’s start with refining which index we would like to search from. For this analysis, we are remotely computing on the ORNL node since this is where rooki is running. We know this from checking the ._url
method of rooki!
rooki._url
'https://esgf-node.ornl.gov/wps'
Set the Index Node and Search
Because we are using the ORNL-based WPS, we only need information about ORNL holdings. So here we configure intake-esgf to only look at the ORNL index for data information.
intake_esgf.conf.set(indices={"anl-dev": False,
"ornl-dev": True})
<contextlib._GeneratorContextManager at 0x7fa2d87c06d0>
Now we instantiate the catalog and perform a search for surface air temperature (tas) data from a few institution’s models. Note that we have also included specificity of the data node. The ORNL index contains information about holdings beyond the ORNL data node and so we give this to force the catalog to only return information about holdings which are local to ORNL.
cat = ESGFCatalog().search(
experiment_id="historical",
variable_id="tas",
member_id="r1i1p1f1",
table_id="Amon",
institution_id=["MIROC", "NCAR", "NASA-GISS", "CMCC"],
)
cat.df
project | mip_era | activity_drs | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | version | id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CMIP6 | CMIP6 | CMIP | NASA-GISS | GISS-E2-2-H | historical | r1i1p1f1 | Amon | tas | gn | 20191120 | [CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r... |
1 | CMIP6 | CMIP6 | CMIP | CMCC | CMCC-ESM2 | historical | r1i1p1f1 | Amon | tas | gn | 20210114 | [CMIP6.CMIP.CMCC.CMCC-ESM2.historical.r1i1p1f1... |
2 | CMIP6 | CMIP6 | CMIP | NCAR | CESM2-WACCM | historical | r1i1p1f1 | Amon | tas | gn | 20190227 | [CMIP6.CMIP.NCAR.CESM2-WACCM.historical.r1i1p1... |
3 | CMIP6 | CMIP6 | CMIP | CMCC | CMCC-CM2-SR5 | historical | r1i1p1f1 | Amon | tas | gn | 20200616 | [CMIP6.CMIP.CMCC.CMCC-CM2-SR5.historical.r1i1p... |
4 | CMIP6 | CMIP6 | CMIP | NASA-GISS | GISS-E2-1-G | historical | r1i1p1f1 | Amon | tas | gn | 20180827 | [CMIP6.CMIP.NASA-GISS.GISS-E2-1-G.historical.r... |
5 | CMIP6 | CMIP6 | CMIP | NASA-GISS | GISS-E2-2-G | historical | r1i1p1f1 | Amon | tas | gn | 20191120 | [CMIP6.CMIP.NASA-GISS.GISS-E2-2-G.historical.r... |
6 | CMIP6 | CMIP6 | CMIP | NCAR | CESM2-WACCM-FV2 | historical | r1i1p1f1 | Amon | tas | gn | 20191120 | [CMIP6.CMIP.NCAR.CESM2-WACCM-FV2.historical.r1... |
8 | CMIP6 | CMIP6 | CMIP | NCAR | CESM2-FV2 | historical | r1i1p1f1 | Amon | tas | gn | 20191120 | [CMIP6.CMIP.NCAR.CESM2-FV2.historical.r1i1p1f1... |
9 | CMIP6 | CMIP6 | CMIP | CMCC | CMCC-CM2-HR4 | historical | r1i1p1f1 | Amon | tas | gn | 20200904 | [CMIP6.CMIP.CMCC.CMCC-CM2-HR4.historical.r1i1p... |
10 | CMIP6 | CMIP6 | CMIP | NASA-GISS | GISS-E2-1-H | historical | r1i1p1f1 | Amon | tas | gn | 20190403 | [CMIP6.CMIP.NASA-GISS.GISS-E2-1-H.historical.r... |
15 | CMIP6 | CMIP6 | CMIP | NCAR | CESM2 | historical | r1i1p1f1 | Amon | tas | gn | 20190308 | [CMIP6.CMIP.NCAR.CESM2.historical.r1i1p1f1.Amo... |
16 | CMIP6 | CMIP6 | CMIP | NASA-GISS | GISS-E2-1-G-CC | historical | r1i1p1f1 | Amon | tas | gn | 20190815 | [CMIP6.CMIP.NASA-GISS.GISS-E2-1-G-CC.historica... |
22 | CMIP6 | CMIP6 | CMIP | MIROC | MIROC6 | historical | r1i1p1f1 | Amon | tas | gn | 20181212 | [CMIP6.CMIP.MIROC.MIROC6.historical.r1i1p1f1.A... |
Extract IDs to Pass to Rooki
The catalog returns a lot of information about the datasets that were found, but the rooki WPS interface just needs an ID that looks similar to what we find in the id
column of the dataframe. We need to remove the |esgf-node.ornl.gov
on the end and prepend a ccs03_data
. To do this we will write a function and apply it to the dataframe.
def build_rooki_id(id_list):
rooki_id = id_list[0]
rooki_id = rooki_id.split("|")[0]
rooki_id = f"css03_data.{rooki_id}" # <-- just something you have to know for now :(
return rooki_id
rooki_ids = cat.df.id.apply(build_rooki_id).to_list()
rooki_ids
['css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r1i1p1f1.Amon.tas.gn.v20191120',
'css03_data.CMIP6.CMIP.CMCC.CMCC-ESM2.historical.r1i1p1f1.Amon.tas.gn.v20210114',
'css03_data.CMIP6.CMIP.NCAR.CESM2-WACCM.historical.r1i1p1f1.Amon.tas.gn.v20190227',
'css03_data.CMIP6.CMIP.CMCC.CMCC-CM2-SR5.historical.r1i1p1f1.Amon.tas.gn.v20200616',
'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-1-G.historical.r1i1p1f1.Amon.tas.gn.v20180827',
'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-2-G.historical.r1i1p1f1.Amon.tas.gn.v20191120',
'css03_data.CMIP6.CMIP.NCAR.CESM2-WACCM-FV2.historical.r1i1p1f1.Amon.tas.gn.v20191120',
'css03_data.CMIP6.CMIP.NCAR.CESM2-FV2.historical.r1i1p1f1.Amon.tas.gn.v20191120',
'css03_data.CMIP6.CMIP.CMCC.CMCC-CM2-HR4.historical.r1i1p1f1.Amon.tas.gn.v20200904',
'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-1-H.historical.r1i1p1f1.Amon.tas.gn.v20190403',
'css03_data.CMIP6.CMIP.NCAR.CESM2.historical.r1i1p1f1.Amon.tas.gn.v20190308',
'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-1-G-CC.historical.r1i1p1f1.Amon.tas.gn.v20190815',
'css03_data.CMIP6.CMIP.MIROC.MIROC6.historical.r1i1p1f1.Amon.tas.gn.v20181212']
Compute with Rooki
Now that we have a list of IDs to pass to rooki, let’s compute! In our case we are interested in the annual temperature from 1990-2000 over an area that includes India (latitude from 0 to 35, longitude from 65 to 100). The following function will construct a rooki workflow that uses operators (functions in the ops
namespace) that rooki uses to:
read in data (
ops.Input
)subset in time and space (
ops.Subset
), andaverage in time (
ops.AverageByTime
) on a yearly frequency.
We then check to make sure the response is okay, and if it is, return the processed dataset to the user! If something went wrong, the function will raise an error and show you the message that rooki sent back.
def india_annual_temperature(rooki_id):
workflow = ops.AverageByTime(
ops.Subset(
ops.Input("tas", [rooki_id]),
time="1990-01-01/2000-01-01",
area="65,0,100,35",
),
freq="year",
)
response = workflow.orchestrate()
if not response.ok:
raise ValueError(response)
return response.datasets()[0]
Now let’s test a single rooki_id to demonstrate successful functionality. The rooki_id let’s the WPS know on which dataset we are intersted in operating and then the data is loaded remotely, subset, and then averaged. After this computation is finished on the server, the result is transferred to you and loaded into a xarray dataset. Inspect the dataset header to see that there are 10 times, one for each year and the latitude and longitude range spans our input values.
india_annual_temperature(rooki_ids[0])
Metalink content-type detected.
Downloading to /tmp/metalink_0l3no9bu/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
<xarray.Dataset> Size: 16kB Dimensions: (lat: 18, time: 10, bnds: 2, lon: 14) Coordinates: * lat (lat) float64 144B 1.0 3.0 5.0 7.0 9.0 ... 29.0 31.0 33.0 35.0 * lon (lon) float64 112B 66.25 68.75 71.25 73.75 ... 93.75 96.25 98.75 height float64 8B ... * time (time) object 80B 1990-01-01 00:00:00 ... 1999-01-01 00:00:00 Dimensions without coordinates: bnds Data variables: lat_bnds (time, lat, bnds) float64 3kB ... lon_bnds (time, lon, bnds) float64 2kB ... tas (time, lat, lon) float32 10kB ... time_bnds (time, bnds) object 160B ... Attributes: (12/48) Conventions: CF-1.7 CMIP-6.2 activity_id: CMIP branch_method: standard branch_time_in_child: 0.0 branch_time_in_parent: 0.0 contact: Kenneth Lo (cdkkl@giss.nasa.gov) ... ... title: GISS-E2-2-H output prepared for CMIP6 tracking_id: hdl:21.14100/503cf427-12d4-4e54-a431-b9843112f320 variable_id: tas variant_label: r1i1p1f1 license: CMIP6 model data produced by NASA Goddard Institu... cmor_version: 3.3.2
Now that we have some confidence in our workflow function, we can iterate over rooki_id’s running for each and saving into a dictionary whose keys are the different models. You should see messages print to the screen which inform you where the temporary output is being downloaded. This location can be configured in rooki, but for now we will just load them into datasets.
dsd = {
rooki_id.split(".")[4]: india_annual_temperature(rooki_id)
for rooki_id in rooki_ids
}
Metalink content-type detected.
Downloading to /tmp/metalink_pgo05w04/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_diw1beph/tas_Amon_CMCC-ESM2_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_d3ubd29e/tas_Amon_CESM2-WACCM_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_503o8fu0/tas_Amon_CMCC-CM2-SR5_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_nwqb_szw/tas_Amon_GISS-E2-1-G_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_9i6xq_ba/tas_Amon_GISS-E2-2-G_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_jnfix4pf/tas_Amon_CESM2-WACCM-FV2_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_tinmmmt9/tas_Amon_CESM2-FV2_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_2v8f_yl6/tas_Amon_CMCC-CM2-HR4_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_l82e8hgt/tas_Amon_GISS-E2-1-H_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_jki9k3a7/tas_Amon_CESM2_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_p8njukg7/tas_Amon_GISS-E2-1-G-CC_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Metalink content-type detected.
Downloading to /tmp/metalink_286phjqc/tas_Amon_MIROC6_historical_r1i1p1f1_gn_19900101-19990101_avg-year.nc.
Visualize the Output
Let’s use hvPlot to visualize. The datasets are stored in a dictionary of datasets, we need to:
Extract a single key
Plot a contour filled visualization, with some geographic features
tas = dsd["MIROC6"].tas
tas.hvplot.contourf(
x="lon",
y="lat",
cmap="Reds",
levels=20,
clim=(250, 320),
features=["land", "ocean"],
alpha=0.7,
widget_location="bottom",
clabel="Yearly Average Temperature (K)",
geo=True,
)
/home/runner/miniconda3/envs/cookbook-dev/lib/python3.10/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/110m_physical/ne_110m_land.zip
warnings.warn(f'Downloading: {url}', DownloadWarning)
/home/runner/miniconda3/envs/cookbook-dev/lib/python3.10/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/110m_physical/ne_110m_ocean.zip
warnings.warn(f'Downloading: {url}', DownloadWarning)
Summary
Within this notebook, we learned how to specify a specific index node to search from, pass discovered datasets to rooki, and chain remote-compute with several operations using rooki. We then visualized the output using hvPlot, leading to an interactive plot!