Using intake-esgf with rooki
Here we dig into using intake-esgf to search for data, then rooki to do server-side computing!
Overview
If you have an introductory paragraph, lead with it here! Keep it short and tied to your material, then be sure to continue into the required list of topics below,
Search and find data using intake-esgf, returning the dataset ids
Feed the dataset ids to rooki to subset and average the data remotely
Visualize the results on the end-user side
Prerequisites
Concepts |
Importance |
Notes |
---|---|---|
Necessary |
How to configure a search and use output |
|
Helpful |
How to initialize and run rooki |
|
Necessary |
How to plot interactive visualizations |
Time to learn: 30 minutes
Imports
import os
# Set the rooki client to use the oak ridge national lab (ORNL) WPS deployment
url = 'https://esgf-wps.apps.onyx.ccs.ornl.gov/wps'
os.environ['ROOK_URL'] = url
from rooki import rooki
from rooki import operators as ops
import intake_esgf
from intake_esgf import ESGFCatalog
import xarray as xr
import hvplot.xarray
import holoviews as hv
import panel as pn
hv.extension("bokeh")
Search and Find Data for Surface Temperature on DKRZ Node
Let’s start with refining which index we would like to search from. For this analysis, we are remotely computing on the ORNL node since this is where rooki is running. We know this from checking the ._url
method of rooki!
rooki._url
'https://esgf-wps.apps.onyx.ccs.ornl.gov/wps'
Set the Index Node and Search
We need to ensure only the Oak Ridge National Lab (ORNL) index node is active, not the Argonne National Laboratory (ANL) node. We can accomplish this using the configuration settings for intake-esgf.
intake_esgf.conf.set(indices={"anl-dev":False,
"ornl-dev":True,
}
)
cat = ESGFCatalog()
cat = ESGFCatalog()
cat.search(
activity_id='CMIP',
experiment_id=["historical",],
variable_id=["tas"],
member_id='r1i1p1f1',
grid_label='gn',
table_id="Amon",
institution_id=["MIROC", "NCAR", "NASA-GISS", "CMCC"]
)
cat.df
mip_era | experiment_id | variable_id | source_id | version | grid_label | member_id | table_id | institution_id | activity_drs | project | id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CMIP6 | historical | tas | GISS-E2-2-H | 20191120 | gn | r1i1p1f1 | Amon | NASA-GISS | CMIP | CMIP6 | [CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r... |
1 | CMIP6 | historical | tas | CMCC-ESM2 | 20210114 | gn | r1i1p1f1 | Amon | CMCC | CMIP | CMIP6 | [CMIP6.CMIP.CMCC.CMCC-ESM2.historical.r1i1p1f1... |
2 | CMIP6 | historical | tas | CESM2-WACCM | 20190227 | gn | r1i1p1f1 | Amon | NCAR | CMIP | CMIP6 | [CMIP6.CMIP.NCAR.CESM2-WACCM.historical.r1i1p1... |
3 | CMIP6 | historical | tas | CMCC-CM2-SR5 | 20200616 | gn | r1i1p1f1 | Amon | CMCC | CMIP | CMIP6 | [CMIP6.CMIP.CMCC.CMCC-CM2-SR5.historical.r1i1p... |
4 | CMIP6 | historical | tas | GISS-E2-1-G | 20180827 | gn | r1i1p1f1 | Amon | NASA-GISS | CMIP | CMIP6 | [CMIP6.CMIP.NASA-GISS.GISS-E2-1-G.historical.r... |
5 | CMIP6 | historical | tas | GISS-E2-2-G | 20191120 | gn | r1i1p1f1 | Amon | NASA-GISS | CMIP | CMIP6 | [CMIP6.CMIP.NASA-GISS.GISS-E2-2-G.historical.r... |
6 | CMIP6 | historical | tas | CESM2-WACCM-FV2 | 20191120 | gn | r1i1p1f1 | Amon | NCAR | CMIP | CMIP6 | [CMIP6.CMIP.NCAR.CESM2-WACCM-FV2.historical.r1... |
8 | CMIP6 | historical | tas | CESM2-FV2 | 20191120 | gn | r1i1p1f1 | Amon | NCAR | CMIP | CMIP6 | [CMIP6.CMIP.NCAR.CESM2-FV2.historical.r1i1p1f1... |
9 | CMIP6 | historical | tas | CMCC-CM2-HR4 | 20200904 | gn | r1i1p1f1 | Amon | CMCC | CMIP | CMIP6 | [CMIP6.CMIP.CMCC.CMCC-CM2-HR4.historical.r1i1p... |
10 | CMIP6 | historical | tas | GISS-E2-1-H | 20190403 | gn | r1i1p1f1 | Amon | NASA-GISS | CMIP | CMIP6 | [CMIP6.CMIP.NASA-GISS.GISS-E2-1-H.historical.r... |
15 | CMIP6 | historical | tas | CESM2 | 20190308 | gn | r1i1p1f1 | Amon | NCAR | CMIP | CMIP6 | [CMIP6.CMIP.NCAR.CESM2.historical.r1i1p1f1.Amo... |
16 | CMIP6 | historical | tas | GISS-E2-1-G-CC | 20190815 | gn | r1i1p1f1 | Amon | NASA-GISS | CMIP | CMIP6 | [CMIP6.CMIP.NASA-GISS.GISS-E2-1-G-CC.historica... |
22 | CMIP6 | historical | tas | MIROC6 | 20181212 | gn | r1i1p1f1 | Amon | MIROC | CMIP | CMIP6 | [CMIP6.CMIP.MIROC.MIROC6.historical.r1i1p1f1.A... |
Extract the Dataset ID and Pass to Rooki
Now that we have set of datasets, we need to extract the dataset_id
, which is the unique identifier for the dataset. We can pull this from the id
column from intake-esgf
Separate the Dataset ID
cat.df.id.values[0]
['CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r1i1p1f1.Amon.tas.gn.v20191120|esgf-data04.diasjp.net',
'CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r1i1p1f1.Amon.tas.gn.v20191120|dpesgf03.nccs.nasa.gov',
'CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r1i1p1f1.Amon.tas.gn.v20191120|esgf-node.ornl.gov']
Notice how the node information is added onto end of the file id. We need to “chop off” that last bit, leaving everything before the |
character. We put this into a function to make it easier to generalize and apply.
def separate_dataset_id(full_dataset):
"""
Create the path to the data, adding an additional css03_data to indicate the CMIP6 archive at ORNL
"""
return f"css03_data.{full_dataset[0].split('|')[0]}"
separate_dataset_id(cat.df.id.values[0])
'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r1i1p1f1.Amon.tas.gn.v20191120'
Now, we can apply this to the entire list within our dataframe using the following
dsets = [separate_dataset_id(dataset) for dataset in list(cat.df.id.values)]
dsets
['css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-2-H.historical.r1i1p1f1.Amon.tas.gn.v20191120',
'css03_data.CMIP6.CMIP.CMCC.CMCC-ESM2.historical.r1i1p1f1.Amon.tas.gn.v20210114',
'css03_data.CMIP6.CMIP.NCAR.CESM2-WACCM.historical.r1i1p1f1.Amon.tas.gn.v20190227',
'css03_data.CMIP6.CMIP.CMCC.CMCC-CM2-SR5.historical.r1i1p1f1.Amon.tas.gn.v20200616',
'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-1-G.historical.r1i1p1f1.Amon.tas.gn.v20180827',
'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-2-G.historical.r1i1p1f1.Amon.tas.gn.v20191120',
'css03_data.CMIP6.CMIP.NCAR.CESM2-WACCM-FV2.historical.r1i1p1f1.Amon.tas.gn.v20191120',
'css03_data.CMIP6.CMIP.NCAR.CESM2-FV2.historical.r1i1p1f1.Amon.tas.gn.v20191120',
'css03_data.CMIP6.CMIP.CMCC.CMCC-CM2-HR4.historical.r1i1p1f1.Amon.tas.gn.v20200904',
'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-1-H.historical.r1i1p1f1.Amon.tas.gn.v20190403',
'css03_data.CMIP6.CMIP.NCAR.CESM2.historical.r1i1p1f1.Amon.tas.gn.v20190308',
'css03_data.CMIP6.CMIP.NASA-GISS.GISS-E2-1-G-CC.historical.r1i1p1f1.Amon.tas.gn.v20190815',
'css03_data.CMIP6.CMIP.MIROC.MIROC6.historical.r1i1p1f1.Amon.tas.gn.v20181212']
Compute with Rooki
Now that we have a list of IDs to pass to rooki, let’s compute!
In this case, we are:
Subsetting from the year 1900 to 2000
Subsetting near India using the bounds
65,0,100,35
Computing the yealy average
We then check to make sure the response is okay, and if it is, return that to the user!
def compute_annual_mean_subset(dset_id):
# Subset by area then time
wf = ops.AverageByTime(
ops.Subset(
ops.Input(
'tas', [dsets[0]]
),
time='1900-01-01/2000-12-31',
area='65,0,100,35',
),
freq="year"
)
resp = wf.orchestrate()
if resp.ok:
ds = resp.datasets()[0]
else:
ds = xr.Dataset()
return ds
compute_annual_mean_subset(dsets[0])
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_p9cwdemi/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
<xarray.Dataset> Size: 156kB Dimensions: (lat: 18, time: 101, bnds: 2, lon: 14) Coordinates: * lat (lat) float64 144B 1.0 3.0 5.0 7.0 9.0 ... 29.0 31.0 33.0 35.0 * lon (lon) float64 112B 66.25 68.75 71.25 73.75 ... 93.75 96.25 98.75 height float64 8B ... * time (time) object 808B 1900-01-01 00:00:00 ... 2000-01-01 00:00:00 Dimensions without coordinates: bnds Data variables: lat_bnds (time, lat, bnds) float64 29kB ... lon_bnds (time, lon, bnds) float64 23kB ... tas (time, lat, lon) float32 102kB ... time_bnds (time, bnds) object 2kB ... Attributes: (12/48) Conventions: CF-1.7 CMIP-6.2 activity_id: CMIP branch_method: standard branch_time_in_child: 0.0 branch_time_in_parent: 0.0 contact: Kenneth Lo (cdkkl@giss.nasa.gov) ... ... title: GISS-E2-2-H output prepared for CMIP6 tracking_id: hdl:21.14100/09d7bd73-f74e-4f9a-a14b-205fa5078217 variable_id: tas variant_label: r1i1p1f1 license: CMIP6 model data produced by NASA Goddard Institu... cmor_version: 3.3.2
Now that it works with a single dataset, let’s do this for all the datasets and put them into a dictionary with the dataset ids as the keys.
dset_dict = {}
for dset in dsets:
dset_dict[dset] = compute_annual_mean_subset(dset)
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_4tki31ka/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_r2f02hvl/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_lu8c8qa1/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_uguvw4ac/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_277dxih2/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_8orbkvmb/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_nzhf2ws7/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_lv3psy2l/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_9nop9w3l/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_1eb7837q/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_g_5ggv2j/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_2s_1j9tu/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Metalink content-type detected.
Downloading to /var/folders/bw/c9j8z20x45s2y20vv6528qjc0000gq/T/metalink_sb406lac/tas_Amon_GISS-E2-2-H_historical_r1i1p1f1_gn_19000101-20000101_avg-year.nc.
Visualize the Output
Let’s use hvPlot to visualize. The datasets are stored in a dictionary of datasets, we need to:
Extract a single key
Plot a contour filled visualization, with some geographic features
dset_dict[list(dset_dict.keys())[-1]].tas.hvplot.contourf(x='lon',
y='lat',
cmap='Reds',
levels=20,
clim=(250, 320),
features=["land", "ocean"],
alpha=0.7,
widget_location='bottom',
clabel="Yearly Average Temperature (K)",
geo=True)
Summary
Within this notebook, we learned how to specify a specific index node to search from, pass discovered datasets to rooki, and chain remote-compute with several operations using rooki. We then visualized the output using hvPlot, leading to an interactive plot!