Basic Demonstration of Data Reduction Using Globus, Intake-ESGF, and Clisops
Overview
Within this notebook, we highlight how to use a collection of open-source tools in the Earth System Grid Federation user-computing community, to reduce and select datasets available through the federation of servers. Mainly, we will
Select a given time frame
Subset for a point
Average into yearly frequency
Prerequisites
Concepts |
Importance |
Notes |
---|---|---|
Necessary |
||
Necessary |
Interactive Visualization with hvPlot |
Time to learn: 30 minutes
Imports
import hvplot.xarray
import holoviews as hv
import numpy as np
import hvplot.xarray
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from intake_esgf import ESGFCatalog
import xarray as xr
import warnings
from clisops.ops.subset import subset, subset_bbox
from clisops.ops.average import average_over_dims, average_time
import os
from globus_compute_sdk import Executor, Client
warnings.filterwarnings("ignore")
hv.extension("matplotlib")
Search and Find Data Using Intake-ESGF
Let’s start with a sample dataset - which we can search for using intake-esgf.
cat = ESGFCatalog()
cat
Perform a search() to populate the catalog.
cat.search(
experiment_id="historical",
source_id="CanESM5",
frequency="mon",
variable_id=["gpp", "tas", "pr"],
variant_label="r1i1p1f1", # addition from the last search
)
Searching indices: 100%|███████████████████████████████|1/1 [ 4.22s/index]
Summary information for 3 results:
mip_era [CMIP6]
activity_id [CMIP]
institution_id [CCCma]
source_id [CanESM5]
experiment_id [historical]
member_id [r1i1p1f1]
table_id [Amon, Lmon]
variable_id [tas, pr, gpp]
grid_label [gn]
dtype: object
dsd = cat.to_dataset_dict()
dsd.keys()
Obtaining file info: 100%|███████████████████████████████|3/3 [ 1.24dataset/s]
Adding cell measures: 100%|███████████████████████████████|3/3 [ 3.04s/dataset]
dict_keys(['Amon.tas', 'Lmon.gpp', 'Amon.pr'])
ds = dsd["Amon.tas"]
ds
<xarray.Dataset> Dimensions: (time: 1980, bnds: 2, lat: 64, lon: 128) Coordinates: * time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00 * lat (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86 * lon (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2 height float64 ... Dimensions without coordinates: bnds Data variables: time_bnds (time, bnds) object ... lat_bnds (lat, bnds) float64 ... lon_bnds (lon, bnds) float64 ... tas (time, lat, lon) float32 ... areacella (lat, lon) float32 ... Attributes: (12/53) CCCma_model_hash: 3dedf95315d603326fde4f5340dc0519d80d10c0 CCCma_parent_runid: rc3-pictrl CCCma_pycmor_hash: 33c30511acc319a98240633965a04ca99c26427e CCCma_runid: rc3.1-his01 Conventions: CF-1.7 CMIP-6.2 YMDH_branch_time_in_child: 1850:01:01:00 ... ... tracking_id: hdl:21.14100/872062df-acae-499b-aa0f-9eaca76... variable_id: tas variant_label: r1i1p1f1 version: v20190429 license: CMIP6 model data produced by The Government ... cmor_version: 3.4.0
Use clisops to subset for time and location
def subset_time(ds, start_time="1850-01-01T12:00:00Z", end_time="2014-12-30T12:00:00Z"):
from clisops.ops.subset import subset
return subset(ds, time=f"{start_time}/{end_time}", output="xarray")
def subset_location(ds, lat_bounds=[30, 50], lon_bounds=[-100, -80]):
from clisops.ops.subset import subset_bbox
return subset_bbox(ds, lat_bnds=lat_bounds, lon_bnds=lon_bounds)
ds.tas.isel(time=0).hvplot.quadmesh(geo=True, cmap="Reds")
subset_location(ds).tas.isel(time=-1).hvplot(x='lon',
y='lat',
features=["land", "lakes", "ocean", "borders"],
cmap='Reds',
geo=True)
Calculate a yearly average
def yearly_average(ds):
from clisops.ops.average import average_time
return average_time(ds, "year", output_type="xarray")[0]
yearly_average(subset_location(ds)).isel(time=0).tas.hvplot(x='lon',
y='lat',
features=["land", "lakes", "ocean", "borders"],
cmap='Reds',
geo=True)
yearly_average(subset_location(ds)).isel(time=-1).tas.hvplot(x='lon',
y='lat',
features=["land", "lakes", "ocean", "borders"],
cmap='Reds',
geo=True)
Summary
In this notebook, we applied data reduction functions from the ESGF stack to data accessed through intake-esgf.