Overview¶
In this notebook we will compute the Global Mean Surface Temperature Anomalies (GMSTA) from CMIP6 data and compare it with observations. This notebook is heavily inspired by the GMST example in the CMIP6 cookbook and we thank the authors for their workflow.
We will get the CMIP6 temperature data from the AWS open data program via the us-west-2 origin
In order to do this, we will use an intake-ESM catalog (hosted on NCAR’s GDEX) that uses pelicanFS backed links instead of https or s3 links
We will grab observational data hosted on NCAR’s GDEX, which is accessible via the NCAR origin
Please refer to the first chapter of this cookbook to learn more about OSDF, pelican or pelicanFS
This notebook demonstrates that you can seamlessly stream data from multiple OSDF origins in your workflow
Prerequisites¶
| Concepts | Importance | Notes |
|---|---|---|
| Intro to Intake-ESM | Necessary | Used for searching CMIP6 data |
| Understanding of Zarr | Helpful | Familiarity with metadata structure |
| Seaborn | Helpful | Used for plotting |
| PelicanFS | Necessary | The python package used to stream data in this notebook |
| OSDF | Helpful | OSDF is used to stream data in this notebook |
Time to learn: 20 mins
Imports¶
from matplotlib import pyplot as plt
import xarray as xr
import numpy as np
from dask.diagnostics import progress
from tqdm.autonotebook import tqdm
import intake
import fsspec
import seaborn as sns
import aiohttp
import dask
from dask.distributed import LocalCluster
import pelicanfs /tmp/ipykernel_4052/2849986847.py:5: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from tqdm.autonotebook import tqdm
We will use an intake-ESM catalog hosted on NCAR’s Geoscience Data Exchange. This is nothing but the AWS cmip6 catalog modified to use OSDF
# Load catalog URL
gdex_url = 'https://data.gdex.ucar.edu/'
cat_url = gdex_url + 'd850001/catalogs/osdf/cmip6-aws/cmip6-osdf-zarr.json'
print(cat_url)https://data.gdex.ucar.edu/d850001/catalogs/osdf/cmip6-aws/cmip6-osdf-zarr.json
Set up local dask cluster¶
Before we do any computation let us first set up a local cluster using dask
cluster = LocalCluster()
client = cluster.get_client()/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/distributed/node.py:188: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43641 instead
warnings.warn(
# Scale the cluster
n_workers = 5
cluster.scale(n_workers)
clustercol = intake.open_esm_datastore(cat_url)
col# there is currently a significant amount of data for these runs
expts = ['historical', 'ssp245', 'ssp370']
query = dict(
experiment_id=expts,
table_id='Amon',
variable_id=['tas'],
member_id = 'r1i1p1f1',
#activity_id = 'CMIP',
)
col_subset = col.search(require_all_on=["source_id"], **query)
col_subsetLet us inspect the zarr store paths to see if we are using the pelican protocol.
We see that zstore column has paths that start with ‘osdf:///’ instead of ‘https://’ which tells us that we are not using a simple ‘https’ GET request to fetch the data.
In order to know more about the pelican protocol, please refer to the first chapter of this cookbook.
col_subset.dfGrab some Observational time series data for comparison with ensemble spread¶
The observational data we will use is the HadCRUT5 dataset from the UK Met Office
The data has been downloaded to NCAR’s Geoscience Data Exchange (GDEX) from https://
www .metoffice .gov .uk /hadobs /hadcrut5/ We will use an OSDF to access this copy from the GDEX. Again the links will start with ‘osdf:///’
%%time
obs_url = 'osdf:///ncar/gdex/d850001/HadCRUT.5.0.2.0.analysis.summary_series.global.monthly.nc'
#
obs_ds = xr.open_dataset(obs_url, engine='h5netcdf').tas_mean
obs_dsSome helpful functions¶
def drop_all_bounds(ds):
drop_vars = [vname for vname in ds.coords
if (('_bounds') in vname ) or ('_bnds') in vname]
return ds.drop_vars(drop_vars)
def open_dset(df):
assert len(df) == 1
mapper = fsspec.get_mapper(df.zstore.values[0])
#path = df.zstore.values[0][7:]+".zmetadata"
ds = xr.open_zarr(mapper, consolidated=True)
return drop_all_bounds(ds)
def open_delayed(df):
return dask.delayed(open_dset)(df)
from collections import defaultdict
dsets = defaultdict(dict)
for group, df in col_subset.df.groupby(by=['source_id', 'experiment_id']):
dsets[group[0]][group[1]] = open_delayed(df)dsets_ = dask.compute(dict(dsets))[0]#calculate global means
def get_lat_name(ds):
for lat_name in ['lat', 'latitude']:
if lat_name in ds.coords:
return lat_name
raise RuntimeError("Couldn't find a latitude coordinate")
def global_mean(ds):
lat = ds[get_lat_name(ds)]
weight = np.cos(np.deg2rad(lat))
weight /= weight.mean()
other_dims = set(ds.dims) - {'time'}
return (ds * weight).mean(other_dims)#calculate global means
def get_lat_name(ds):
for lat_name in ['lat', 'latitude']:
if lat_name in ds.coords:
return lat_name
raise RuntimeError("Couldn't find a latitude coordinate")
def global_mean(ds):
lat = ds[get_lat_name(ds)]
weight = np.cos(np.deg2rad(lat))
weight /= weight.mean()
other_dims = set(ds.dims) - {'time'}
return (ds * weight).mean(other_dims)GMST computation¶
expt_da = xr.DataArray(expts, dims='experiment_id', name='experiment_id',
coords={'experiment_id': expts})
dsets_aligned = {}
for k, v in tqdm(dsets_.items()):
expt_dsets = v.values()
if any([d is None for d in expt_dsets]):
print(f"Missing experiment for {k}")
continue
for ds in expt_dsets:
ds.coords['year'] = ds.time.dt.year
# workaround for
# https://github.com/pydata/xarray/issues/2237#issuecomment-620961663
dsets_ann_mean = [v[expt].pipe(global_mean).swap_dims({'time': 'year'})
.drop_vars('time').coarsen(year=12).mean()
for expt in expts]
# align everything with the 4xCO2 experiment
dsets_aligned[k] = xr.concat(dsets_ann_mean, join='outer',dim=expt_da) 0%| | 0/27 [00:00<?, ?it/s] 4%|▎ | 1/27 [00:01<00:42, 1.63s/it] 7%|▋ | 2/27 [00:02<00:28, 1.13s/it] 11%|█ | 3/27 [00:03<00:23, 1.01it/s] 15%|█▍ | 4/27 [00:03<00:20, 1.11it/s] 19%|█▊ | 5/27 [00:04<00:19, 1.15it/s] 22%|██▏ | 6/27 [00:04<00:12, 1.64it/s] 26%|██▌ | 7/27 [00:05<00:13, 1.49it/s] 30%|██▉ | 8/27 [00:06<00:13, 1.45it/s] 33%|███▎ | 9/27 [00:07<00:12, 1.41it/s] 37%|███▋ | 10/27 [00:08<00:13, 1.30it/s] 41%|████ | 11/27 [00:08<00:12, 1.27it/s] 44%|████▍ | 12/27 [00:09<00:11, 1.29it/s] 48%|████▊ | 13/27 [00:10<00:11, 1.27it/s] 52%|█████▏ | 14/27 [00:11<00:10, 1.24it/s] 56%|█████▌ | 15/27 [00:12<00:09, 1.27it/s] 59%|█████▉ | 16/27 [00:12<00:08, 1.28it/s] 63%|██████▎ | 17/27 [00:13<00:07, 1.29it/s] 67%|██████▋ | 18/27 [00:14<00:06, 1.30it/s] 70%|███████ | 19/27 [00:15<00:06, 1.31it/s] 74%|███████▍ | 20/27 [00:15<00:05, 1.30it/s] 78%|███████▊ | 21/27 [00:16<00:04, 1.30it/s] 81%|████████▏ | 22/27 [00:17<00:03, 1.32it/s] 85%|████████▌ | 23/27 [00:18<00:03, 1.32it/s] 89%|████████▉ | 24/27 [00:18<00:02, 1.31it/s] 93%|█████████▎| 25/27 [00:19<00:01, 1.32it/s] 96%|█████████▋| 26/27 [00:20<00:00, 1.33it/s]100%|██████████| 27/27 [00:21<00:00, 1.34it/s]100%|██████████| 27/27 [00:21<00:00, 1.28it/s]
%%time
with progress.ProgressBar():
dsets_aligned_ = dask.compute(dsets_aligned)[0]2025-11-22 01:42:25,667 - distributed.protocol.pickle - ERROR - Failed to serialize 500, message='Internal Server Error', url='https://mghpcc-cache.nationalresearchplatform.org:8443/aws-opendata/us-west-2/cmip6-pds/CMIP6/ScenarioMIP/EC-Earth-Consortium/EC-Earth3-Veg/ssp370/r1i1p1f1/Amon/tas/gr/v20200225/tas/6.0.0'.
Traceback (most recent call last):
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/distributed/worker.py", line 2988, in _run_task_simple
result = task(data)
^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/dask/_task_spec.py", line 759, in __call__
return self.func(*new_argspec)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/dask/_task_spec.py", line 199, in _execute_subgraph
res = execute_graph(final, keys=[outkey])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/dask/_task_spec.py", line 1078, in execute_graph
cache[key] = node(cache)
^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/dask/_task_spec.py", line 759, in __call__
return self.func(*new_argspec)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/dask/array/core.py", line 141, in getter
c = np.asarray(c)
^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 659, in __array__
return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 664, in get_duck_array
return self.array.get_duck_array()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 897, in get_duck_array
return self.array.get_duck_array()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/coding/common.py", line 80, in get_duck_array
return self.func(self.array.get_duck_array())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 737, in get_duck_array
array = self.array[self.key]
~~~~~~~~~~^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 262, in __getitem__
return indexing.explicit_indexing_adapter(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 1129, in explicit_indexing_adapter
result = raw_indexing_method(raw_key.tuple)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 225, in _getitem
return self._array[key]
~~~~~~~~~~~^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 798, in __getitem__
result = self.get_orthogonal_selection(pure_selection, fields=fields)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 1080, in get_orthogonal_selection
return self._get_selection(indexer=indexer, out=out, fields=fields)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 1343, in _get_selection
self._chunk_getitems(
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 2179, in _chunk_getitems
cdatas = self.chunk_store.getitems(ckeys, contexts=contexts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/storage.py", line 1435, in getitems
raise v
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py", line 244, in _run_coro
return await asyncio.wait_for(coro, timeout=timeout), i
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
return await fut
^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/implementations/http.py", line 247, in _cat_file
self._raise_not_found_for_status(r, url)
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/implementations/http.py", line 230, in _raise_not_found_for_status
response.raise_for_status()
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 636, in raise_for_status
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url='https://mghpcc-cache.nationalresearchplatform.org:8443/aws-opendata/us-west-2/cmip6-pds/CMIP6/ScenarioMIP/EC-Earth-Consortium/EC-Earth3-Veg/ssp370/r1i1p1f1/Amon/tas/gr/v20200225/tas/6.0.0'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 63, in dumps
result = pickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: can't pickle multidict._multidict.CIMultiDictProxy objects
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 68, in dumps
pickler.dump(x)
TypeError: can't pickle multidict._multidict.CIMultiDictProxy objects
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 80, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/cloudpickle/cloudpickle.py", line 1544, in dumps
cp.dump(obj)
File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/cloudpickle/cloudpickle.py", line 1313, in dump
return super().dump(obj)
^^^^^^^^^^^^^^^^^
TypeError: can't pickle multidict._multidict.CIMultiDictProxy objects
2025-11-22 01:42:26,108 - distributed.worker - ERROR - Compute Failed
Key: ('mean_chunk-411eb0c9684b9ba09e283f8acb8a5478', 5, 0, 0)
State: executing
Task: <Task ('mean_chunk-411eb0c9684b9ba09e283f8acb8a5478', 5, 0, 0) _execute_subgraph(...)>
Exception: 'ClientPayloadError("Response payload is not completed: <ContentLengthError: 400, message=\'Not enough data to satisfy content length header.\'>")'
Traceback: ' File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/dask/array/core.py", line 141, in getter\n c = np.asarray(c)\n ^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 659, in __array__\n return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)\n ^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 664, in get_duck_array\n return self.array.get_duck_array()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 897, in get_duck_array\n return self.array.get_duck_array()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/coding/common.py", line 80, in get_duck_array\n return self.func(self.array.get_duck_array())\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 737, in get_duck_array\n array = self.array[self.key]\n ~~~~~~~~~~^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 262, in __getitem__\n return indexing.explicit_indexing_adapter(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 1129, in explicit_indexing_adapter\n result = raw_indexing_method(raw_key.tuple)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 225, in _getitem\n return self._array[key]\n ~~~~~~~~~~~^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 798, in __getitem__\n result = self.get_orthogonal_selection(pure_selection, fields=fields)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 1080, in get_orthogonal_selection\n return self._get_selection(indexer=indexer, out=out, fields=fields)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 1343, in _get_selection\n self._chunk_getitems(\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 2179, in _chunk_getitems\n cdatas = self.chunk_store.getitems(ckeys, contexts=contexts)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/storage.py", line 1435, in getitems\n raise v\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py", line 244, in _run_coro\n return await asyncio.wait_for(coro, timeout=timeout), i\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/asyncio/tasks.py", line 520, in wait_for\n return await fut\n ^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/implementations/http.py", line 246, in _cat_file\n out = await r.read()\n ^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 693, in read\n self._body = await self.content.read()\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py", line 426, in read\n block = await self.readany()\n ^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py", line 448, in readany\n await self._wait("readany")\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py", line 355, in _wait\n await waiter\n'
2025-11-22 01:42:26,131 - distributed.worker - ERROR - Compute Failed
Key: ('mean_chunk-mean_agg-aggregate-3f65889357b41af3aaa3ec6ad982a645', 8)
State: executing
Task: <Task ('mean_chunk-mean_agg-aggregate-3f65889357b41af3aaa3ec6ad982a645', 8) _execute_subgraph(...)>
Exception: 'ClientPayloadError("Response payload is not completed: <ContentLengthError: 400, message=\'Not enough data to satisfy content length header.\'>")'
Traceback: ' File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/dask/array/core.py", line 141, in getter\n c = np.asarray(c)\n ^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 659, in __array__\n return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)\n ^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 664, in get_duck_array\n return self.array.get_duck_array()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 897, in get_duck_array\n return self.array.get_duck_array()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/coding/common.py", line 80, in get_duck_array\n return self.func(self.array.get_duck_array())\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 737, in get_duck_array\n array = self.array[self.key]\n ~~~~~~~~~~^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 262, in __getitem__\n return indexing.explicit_indexing_adapter(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 1129, in explicit_indexing_adapter\n result = raw_indexing_method(raw_key.tuple)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 225, in _getitem\n return self._array[key]\n ~~~~~~~~~~~^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 798, in __getitem__\n result = self.get_orthogonal_selection(pure_selection, fields=fields)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 1080, in get_orthogonal_selection\n return self._get_selection(indexer=indexer, out=out, fields=fields)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 1343, in _get_selection\n self._chunk_getitems(\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 2179, in _chunk_getitems\n cdatas = self.chunk_store.getitems(ckeys, contexts=contexts)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/storage.py", line 1435, in getitems\n raise v\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py", line 244, in _run_coro\n return await asyncio.wait_for(coro, timeout=timeout), i\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/asyncio/tasks.py", line 520, in wait_for\n return await fut\n ^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/implementations/http.py", line 246, in _cat_file\n out = await r.read()\n ^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 693, in read\n self._body = await self.content.read()\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py", line 426, in read\n block = await self.readany()\n ^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py", line 448, in readany\n await self._wait("readany")\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py", line 355, in _wait\n await waiter\n'
2025-11-22 01:42:26,148 - distributed.worker - ERROR - Compute Failed
Key: ('mean_chunk-411eb0c9684b9ba09e283f8acb8a5478', 1, 0, 0)
State: executing
Task: <Task ('mean_chunk-411eb0c9684b9ba09e283f8acb8a5478', 1, 0, 0) _execute_subgraph(...)>
Exception: 'ClientPayloadError("Response payload is not completed: <ContentLengthError: 400, message=\'Not enough data to satisfy content length header.\'>")'
Traceback: ' File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/dask/array/core.py", line 141, in getter\n c = np.asarray(c)\n ^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 659, in __array__\n return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)\n ^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 664, in get_duck_array\n return self.array.get_duck_array()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 897, in get_duck_array\n return self.array.get_duck_array()\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/coding/common.py", line 80, in get_duck_array\n return self.func(self.array.get_duck_array())\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 737, in get_duck_array\n array = self.array[self.key]\n ~~~~~~~~~~^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 262, in __getitem__\n return indexing.explicit_indexing_adapter(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 1129, in explicit_indexing_adapter\n result = raw_indexing_method(raw_key.tuple)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 225, in _getitem\n return self._array[key]\n ~~~~~~~~~~~^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 798, in __getitem__\n result = self.get_orthogonal_selection(pure_selection, fields=fields)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 1080, in get_orthogonal_selection\n return self._get_selection(indexer=indexer, out=out, fields=fields)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 1343, in _get_selection\n self._chunk_getitems(\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 2179, in _chunk_getitems\n cdatas = self.chunk_store.getitems(ckeys, contexts=contexts)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/storage.py", line 1435, in getitems\n raise v\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py", line 244, in _run_coro\n return await asyncio.wait_for(coro, timeout=timeout), i\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/asyncio/tasks.py", line 520, in wait_for\n return await fut\n ^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/implementations/http.py", line 246, in _cat_file\n out = await r.read()\n ^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 693, in read\n self._body = await self.content.read()\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py", line 426, in read\n block = await self.readany()\n ^^^^^^^^^^^^^^^^^^^^\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py", line 448, in readany\n await self._wait("readany")\n File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py", line 355, in _wait\n await waiter\n'
CPU times: user 1.62 s, sys: 190 ms, total: 1.81 s
Wall time: 7.91 s
---------------------------------------------------------------------------
ContentLengthError Traceback (most recent call last)
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/client_proto.py:144, in connection_lost()
143 try:
--> 144 uncompleted = self._parser.feed_eof()
145 except Exception as underlying_exc:
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/_http_parser.pyx:509, in aiohttp._http_parser.HttpParser.feed_eof()
508 elif self._cparser.flags & cparser.F_CONTENT_LENGTH:
--> 509 raise ContentLengthError(
510 "Not enough data to satisfy content length header.")
ContentLengthError: 400, message:
Not enough data to satisfy content length header.
The above exception was the direct cause of the following exception:
ClientPayloadError Traceback (most recent call last)
Cell In[14], line 1
----> 1 get_ipython().run_cell_magic('time', '', 'with progress.ProgressBar():\n dsets_aligned_ = dask.compute(dsets_aligned)[0]\n')
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/IPython/core/interactiveshell.py:2565, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
2563 with self.builtin_trap:
2564 args = (magic_arg_s, cell)
-> 2565 result = fn(*args, **kwargs)
2567 # The code below prevents the output from being displayed
2568 # when using magics with decorator @output_can_be_silenced
2569 # when the last Python token in the expression is a ';'.
2570 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/IPython/core/magics/execution.py:1452, in ExecutionMagics.time(self, line, cell, local_ns)
1450 if interrupt_occured:
1451 if exit_on_interrupt and captured_exception:
-> 1452 raise captured_exception
1453 return
1454 return out
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/IPython/core/magics/execution.py:1416, in ExecutionMagics.time(self, line, cell, local_ns)
1414 st = clock2()
1415 try:
-> 1416 exec(code, glob, local_ns)
1417 out = None
1418 # multi-line %%time case
File <timed exec>:2
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/dask/base.py:681, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
678 expr = expr.optimize()
679 keys = list(flatten(expr.__dask_keys__()))
--> 681 results = schedule(expr, keys, **kwargs)
683 return repack(results)
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py:659, in __array__()
655 def __array__(
656 self, dtype: DTypeLike | None = None, /, *, copy: bool | None = None
657 ) -> np.ndarray:
658 if Version(np.__version__) >= Version("2.0.0"):
--> 659 return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
660 else:
661 return np.asarray(self.get_duck_array(), dtype=dtype)
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py:664, in get_duck_array()
663 def get_duck_array(self):
--> 664 return self.array.get_duck_array()
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py:897, in get_duck_array()
896 def get_duck_array(self):
--> 897 return self.array.get_duck_array()
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/coding/common.py:80, in get_duck_array()
79 def get_duck_array(self):
---> 80 return self.func(self.array.get_duck_array())
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py:737, in get_duck_array()
734 from xarray.backends.common import BackendArray
736 if isinstance(self.array, BackendArray):
--> 737 array = self.array[self.key]
738 else:
739 array = apply_indexer(self.array, self.key)
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py:262, in __getitem__()
260 elif isinstance(key, indexing.OuterIndexer):
261 method = self._oindex
--> 262 return indexing.explicit_indexing_adapter(
263 key, array.shape, indexing.IndexingSupport.VECTORIZED, method
264 )
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py:1129, in explicit_indexing_adapter()
1107 """Support explicit indexing by delegating to a raw indexing method.
1108
1109 Outer and/or vectorized indexers are supported by indexing a second time
(...) 1126 Indexing result, in the form of a duck numpy-array.
1127 """
1128 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
-> 1129 result = raw_indexing_method(raw_key.tuple)
1130 if numpy_indices.tuple:
1131 # index the loaded duck array
1132 indexable = as_indexable(result)
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py:225, in _getitem()
224 def _getitem(self, key):
--> 225 return self._array[key]
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py:798, in __getitem__()
796 result = self.vindex[selection]
797 elif is_pure_orthogonal_indexing(pure_selection, self.ndim):
--> 798 result = self.get_orthogonal_selection(pure_selection, fields=fields)
799 else:
800 result = self.get_basic_selection(pure_selection, fields=fields)
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py:1080, in get_orthogonal_selection()
1077 # setup indexer
1078 indexer = OrthogonalIndexer(selection, self)
-> 1080 return self._get_selection(indexer=indexer, out=out, fields=fields)
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py:1343, in _get_selection()
1340 if math.prod(out_shape) > 0:
1341 # allow storage to get multiple items at once
1342 lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
-> 1343 self._chunk_getitems(
1344 lchunk_coords,
1345 lchunk_selection,
1346 out,
1347 lout_selection,
1348 drop_axes=indexer.drop_axes,
1349 fields=fields,
1350 )
1351 if out.shape:
1352 return out
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py:2179, in _chunk_getitems()
2177 if not isinstance(self._meta_array, np.ndarray):
2178 contexts = ConstantMap(ckeys, constant=Context(meta_array=self._meta_array))
-> 2179 cdatas = self.chunk_store.getitems(ckeys, contexts=contexts)
2181 for ckey, chunk_select, out_select in zip(ckeys, lchunk_selection, lout_selection):
2182 if ckey in cdatas:
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/storage.py:1435, in getitems()
1432 continue
1433 elif isinstance(v, Exception):
1434 # Raise any other exception
-> 1435 raise v
1436 else:
1437 # The function calling this method may not recognize the transformed
1438 # keys, so we send the values returned by self.map.getitems back into
1439 # the original key space.
1440 results[keys_transformed[k]] = v
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py:244, in _run_coro()
242 async def _run_coro(coro, i):
243 try:
--> 244 return await asyncio.wait_for(coro, timeout=timeout), i
245 except Exception as e:
246 if not return_exceptions:
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/asyncio/tasks.py:520, in wait_for()
517 raise TimeoutError from exc
519 async with timeouts.timeout(timeout):
--> 520 return await fut
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/implementations/http.py:246, in _cat_file()
244 session = await self.set_session()
245 async with session.get(self.encode_url(url), **kw) as r:
--> 246 out = await r.read()
247 self._raise_not_found_for_status(r, url)
248 return out
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/client_reqrep.py:693, in read()
691 if self._body is None:
692 try:
--> 693 self._body = await self.content.read()
694 for trace in self._traces:
695 await trace.send_response_chunk_received(
696 self.method, self.url, self._body
697 )
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py:426, in read()
424 blocks = []
425 while True:
--> 426 block = await self.readany()
427 if not block:
428 break
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py:448, in readany()
444 # TODO: should be `if` instead of `while`
445 # because waiter maybe triggered on chunk end,
446 # without feeding any data
447 while not self._buffer and not self._eof:
--> 448 await self._wait("readany")
450 return self._read_nowait(-1)
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py:355, in _wait()
353 try:
354 with self._timer:
--> 355 await waiter
356 finally:
357 self._waiter = None
ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data to satisfy content length header.'>source_ids = list(dsets_aligned_.keys())
source_da = xr.DataArray(source_ids, dims='source_id', name='source_id',
coords={'source_id': source_ids})
big_ds = xr.concat([ds.reset_coords(drop=True)
for ds in dsets_aligned_.values()],
dim=source_da)
big_ds# Compute annual mean temperatures anomalies of observational data
obs_gmsta = obs_ds.resample(time='YS').mean(dim='time')
# obs_gmstaCompute anomlaies and plot¶
We will compute the temperature anomalies w.r.t 1960-1990 baseline period
Convert xarray datasets to pandas dataframes
Use Seaborn to plot GMSTA
df_all = big_ds.to_dataframe().reset_index()
df_all.head()# Define the baseline period
baseline_df = df_all[(df_all["year"] >= 1960) & (df_all["year"] <= 1990)]
# Compute the baseline mean
baseline_mean = baseline_df["tas"].mean()
# Compute anomalies
df_all["tas_anomaly"] = df_all["tas"] - baseline_mean
df_allobs_df = obs_gmsta.to_dataframe(name='tas_anomaly').reset_index()# Convert 'time' to 'year' (keeping only the year)
obs_df['year'] = obs_df['time'].dt.year
# Drop the original 'time' column since we extracted 'year'
obs_df = obs_df[['year', 'tas_anomaly']]
obs_dfAlmost there! Let us now use seaborn to plot all the anomalies
g = sns.relplot(data=df_all, x="year", y="tas_anomaly",
hue='experiment_id', kind="line", errorbar="sd", aspect=2, palette="Set2") # Adjust the color palette)
# Get the current axis from the FacetGrid
ax = g.ax
# Overlay the observational data in red
sns.lineplot(data=obs_df, x="year", y="tas_anomaly",color="red",
linestyle="dashed", linewidth=2,label="Observations", ax=ax)
# Adjust the legend to include observations
ax.legend(title="Experiment ID + Observations")
# Show the plot
plt.show()Summary¶
In this notebook, we used surface air temperature data from several CMIP6 models for the ‘historical’, ‘SSP245’ and ‘SSP370’ runs to compute Global Mean Surface Temperature Anomaly (GMSTA) relative to the 1960-1990 baseline period and compare it with anomalies computed from the HadCRUT monthly surface temperature dataset. We used a modified intake-ESM catalog and pelicanFS to ‘stream/download’ temperature data from two different OSDF origins. The CMIP6 model data was streamed from the AWS OpenData origin in the us-west-2 region and the observational data was streamed from NCAR’s OSDF origin.
Resources and references¶
Original notebook in the Pangeo Gallery by Henri Drake and Ryan Abernathey
CMIP6 cookbook by Ryan Abernathey, Henri Drake, Robert Ford and Max Grover
Coupled Model Intercomparison Project 6 was accessed from https://
registry .opendata .aws /cmip6 using a modified intake-ESM catalog hosted on NCAR’s GDEX We thank the UK Met Office Hadley Center for providing the observational data