Skip to article frontmatterSkip to article content

Global Mean Surface Temperature Anomalies (GMSTA) from CMIP6 data


Overview

In this notebook we will compute the Global Mean Surface Temperature Anomalies (GMSTA) from CMIP6 data and compare it with observations. This notebook is heavily inspired by the GMST example in the CMIP6 cookbook and we thank the authors for their workflow.

  1. We will get the CMIP6 temperature data from the AWS open data program via the us-west-2 origin

  2. In order to do this, we will use an intake-ESM catalog (hosted on NCAR’s GDEX) that uses pelicanFS backed links instead of https or s3 links

  3. We will grab observational data hosted on NCAR’s GDEX, which is accessible via the NCAR origin

  4. Please refer to the first chapter of this cookbook to learn more about OSDF, pelican or pelicanFS

  5. This notebook demonstrates that you can seamlessly stream data from multiple OSDF origins in your workflow

Prerequisites

ConceptsImportanceNotes
Intro to Intake-ESMNecessaryUsed for searching CMIP6 data
Understanding of ZarrHelpfulFamiliarity with metadata structure
SeabornHelpfulUsed for plotting
PelicanFSNecessaryThe python package used to stream data in this notebook
OSDFHelpfulOSDF is used to stream data in this notebook
  • Time to learn: 20 mins

Imports

from matplotlib import pyplot as plt
import xarray as xr
import numpy as np
from dask.diagnostics import progress
from tqdm.autonotebook import tqdm
import intake
import fsspec
import seaborn as sns
import aiohttp
import dask
from dask.distributed import LocalCluster
import pelicanfs 
/tmp/ipykernel_4035/2849986847.py:5: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from tqdm.autonotebook import tqdm

We will use an intake-ESM catalog hosted on NCAR’s Geoscience Data Exchange. This is nothing but the AWS cmip6 catalog modified to use OSDF

# Load catalog URL
gdex_url     =  'https://data.gdex.ucar.edu/'
cat_url     = gdex_url +  'd850001/catalogs/osdf/cmip6-aws/cmip6-osdf-zarr.json'
print(cat_url)
https://data.gdex.ucar.edu/d850001/catalogs/osdf/cmip6-aws/cmip6-osdf-zarr.json

Set up local dask cluster

Before we do any computation let us first set up a local cluster using dask

cluster = LocalCluster()          
client = cluster.get_client()
/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/distributed/node.py:187: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43313 instead
  warnings.warn(
# Scale the cluster
n_workers = 5
cluster.scale(n_workers)
cluster
Loading...

Data Loading

Load CMIP6 data from AWS

col = intake.open_esm_datastore(cat_url)
col
Loading...
# there is currently a significant amount of data for these runs
expts = ['historical', 'ssp245', 'ssp370']

query = dict(
    experiment_id=expts,
    table_id='Amon',
    variable_id=['tas'],
    member_id = 'r1i1p1f1',
    #activity_id = 'CMIP',
)

col_subset = col.search(require_all_on=["source_id"], **query)
col_subset
Loading...
  • Let us inspect the zarr store paths to see if we are using the pelican protocol.

  • We see that zstore column has paths that start with ‘osdf:///’ instead of ‘https://’ which tells us that we are not using a simple ‘https’ GET request to fetch the data.

  • In order to know more about the pelican protocol, please refer to the first chapter of this cookbook.

col_subset.df
Loading...

Grab some Observational time series data for comparison with ensemble spread

  • The observational data we will use is the HadCRUT5 dataset from the UK Met Office

  • The data has been downloaded to NCAR’s Geoscience Data Exchange (GDEX) from https://www.metoffice.gov.uk/hadobs/hadcrut5/

  • We will use an OSDF to access this copy from the GDEX. Again the links will start with ‘osdf:///’

%%time
obs_url    = 'osdf:///ncar/gdex/d850001/HadCRUT.5.0.2.0.analysis.summary_series.global.monthly.nc'
#
obs_ds = xr.open_dataset(obs_url, engine='h5netcdf').tas_mean
obs_ds
Loading...

Some helpful functions

def drop_all_bounds(ds):
    drop_vars = [vname for vname in ds.coords
                 if (('_bounds') in vname ) or ('_bnds') in vname]
    return ds.drop_vars(drop_vars)

def open_dset(df):
    assert len(df) == 1
    mapper = fsspec.get_mapper(df.zstore.values[0])
    #path = df.zstore.values[0][7:]+".zmetadata"
    ds = xr.open_zarr(mapper, consolidated=True)
    return drop_all_bounds(ds)

def open_delayed(df):
    return dask.delayed(open_dset)(df)

from collections import defaultdict
dsets = defaultdict(dict)

for group, df in col_subset.df.groupby(by=['source_id', 'experiment_id']):
    dsets[group[0]][group[1]] = open_delayed(df)
dsets_ = dask.compute(dict(dsets))[0]
2025-10-31 01:44:30,388 - distributed.worker - ERROR - Compute Failed
Key:       open_dset-fbe8f102-55b2-4eb5-85fe-1de26083bb4d
State:     executing
Task:  <Task 'open_dset-fbe8f102-55b2-4eb5-85fe-1de26083bb4d' open_dset(...)>
Exception: 'ClientPayloadError("Response payload is not completed: <ContentLengthError: 400, message=\'Not enough data to satisfy content length header.\'>")'
Traceback: '  File "/tmp/ipykernel_4035/2106968831.py", line 10, in open_dset\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 1586, in open_zarr\n    ds = open_dataset(\n         ^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/api.py", line 602, in open_dataset\n    ds = _dataset_from_backend_dataset(\n         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/api.py", line 292, in _dataset_from_backend_dataset\n    ds = _maybe_create_default_indexes(backend_ds)\n         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/api.py", line 268, in _maybe_create_default_indexes\n    return ds.assign_coords(Coordinates(to_index))\n                            ^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/coordinates.py", line 315, in __init__\n    index, index_vars = create_default_index_implicit(var, list(coords))\n                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexes.py", line 1638, in create_default_index_implicit\n    index = PandasIndex.from_variables(dim_var, options={})\n            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexes.py", line 720, in from_variables\n    data = var._data if isinstance(var._data, PandasIndexingAdapter) else var.data  # type: ignore[redundant-expr]\n                                                                          ^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/variable.py", line 456, in data\n    duck_array = self._data.get_duck_array()\n                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 897, in get_duck_array\n    return self.array.get_duck_array()\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 737, in get_duck_array\n    array = self.array[self.key]\n            ~~~~~~~~~~^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 262, in __getitem__\n    return indexing.explicit_indexing_adapter(\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py", line 1129, in explicit_indexing_adapter\n    result = raw_indexing_method(raw_key.tuple)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 225, in _getitem\n    return self._array[key]\n           ~~~~~~~~~~~^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 800, in __getitem__\n    result = self.get_basic_selection(pure_selection, fields=fields)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 926, in get_basic_selection\n    return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 968, in _get_basic_selection_nd\n    return self._get_selection(indexer=indexer, out=out, fields=fields)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 1343, in _get_selection\n    self._chunk_getitems(\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py", line 2179, in _chunk_getitems\n    cdatas = self.chunk_store.getitems(ckeys, contexts=contexts)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/storage.py", line 1435, in getitems\n    raise v\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py", line 244, in _run_coro\n    return await asyncio.wait_for(coro, timeout=timeout), i\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/asyncio/tasks.py", line 520, in wait_for\n    return await fut\n           ^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/implementations/http.py", line 246, in _cat_file\n    out = await r.read()\n          ^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 693, in read\n    self._body = await self.content.read()\n                 ^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py", line 426, in read\n    block = await self.readany()\n            ^^^^^^^^^^^^^^^^^^^^\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py", line 448, in readany\n    await self._wait("readany")\n  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py", line 355, in _wait\n    await waiter\n'

2025-10-31 01:44:30,619 - distributed.protocol.pickle - ERROR - Failed to serialize 504, message='Gateway Timeout', url='https://osdfcache.ligo.caltech.edu:8443/aws-opendata/us-west-2/cmip6-pds/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3-Veg-LR/historical/r1i1p1f1/Amon/tas/gr/v20200217/.zmetadata'.
Traceback (most recent call last):
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/distributed/worker.py", line 2986, in _run_task_simple
    result = task(data)
             ^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/dask/_task_spec.py", line 759, in __call__
    return self.func(*new_argspec)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipykernel_4035/2106968831.py", line 10, in open_dset
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 1586, in open_zarr
    ds = open_dataset(
         ^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/api.py", line 596, in open_dataset
    backend_ds = backend.open_dataset(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 1660, in open_dataset
    store = ZarrStore.open_group(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 714, in open_group
    ) = _get_open_params(
        ^^^^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py", line 1864, in _get_open_params
    zarr_root_group = zarr.open_consolidated(store, **open_kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/convenience.py", line 1360, in open_consolidated
    meta_store = ConsolidatedStoreClass(store, metadata_key=metadata_key)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/storage.py", line 3046, in __init__
    meta = json_loads(self.store[metadata_key])
                      ~~~~~~~~~~^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/storage.py", line 1446, in __getitem__
    return self.map[key]
           ~~~~~~~~^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/mapping.py", line 155, in __getitem__
    result = self.fs.cat(k)
             ^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/pelicanfs/core.py", line 1008, in wrapper
    result = await func(self, data_url, *args[1:], **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/pelicanfs/core.py", line 1052, in _cat
    results = await self.http_file_system._cat(path, recursive, on_error, batch_size, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py", line 464, in _cat
    raise ex
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py", line 244, in _run_coro
    return await asyncio.wait_for(coro, timeout=timeout), i
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/implementations/http.py", line 247, in _cat_file
    self._raise_not_found_for_status(r, url)
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/implementations/http.py", line 230, in _raise_not_found_for_status
    response.raise_for_status()
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 636, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 504, message='Gateway Timeout', url='https://osdfcache.ligo.caltech.edu:8443/aws-opendata/us-west-2/cmip6-pds/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3-Veg-LR/historical/r1i1p1f1/Amon/tas/gr/v20200217/.zmetadata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: can't pickle multidict._multidict.CIMultiDictProxy objects

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
TypeError: can't pickle multidict._multidict.CIMultiDictProxy objects

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 80, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/cloudpickle/cloudpickle.py", line 1537, in dumps
    cp.dump(obj)
  File "/home/runner/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/cloudpickle/cloudpickle.py", line 1303, in dump
    return super().dump(obj)
           ^^^^^^^^^^^^^^^^^
TypeError: can't pickle multidict._multidict.CIMultiDictProxy objects
---------------------------------------------------------------------------
ContentLengthError                        Traceback (most recent call last)
File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/client_proto.py:144, in connection_lost()
    143 try:
--> 144     uncompleted = self._parser.feed_eof()
    145 except Exception as underlying_exc:

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/_http_parser.pyx:509, in aiohttp._http_parser.HttpParser.feed_eof()
    508 elif self._cparser.flags & cparser.F_CONTENT_LENGTH:
--> 509     raise ContentLengthError(
    510         "Not enough data to satisfy content length header.")

ContentLengthError: 400, message:
  Not enough data to satisfy content length header.

The above exception was the direct cause of the following exception:

ClientPayloadError                        Traceback (most recent call last)
Cell In[10], line 1
----> 1 dsets_ = dask.compute(dict(dsets))[0]

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/dask/base.py:681, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    678     expr = expr.optimize()
    679     keys = list(flatten(expr.__dask_keys__()))
--> 681     results = schedule(expr, keys, **kwargs)
    683 return repack(results)

Cell In[9], line 10, in open_dset()
      8 mapper = fsspec.get_mapper(df.zstore.values[0])
      9 #path = df.zstore.values[0][7:]+".zmetadata"
---> 10 ds = xr.open_zarr(mapper, consolidated=True)
     11 return drop_all_bounds(ds)

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py:1586, in open_zarr()
   1572     raise TypeError(
   1573         "open_zarr() got unexpected keyword arguments " + ",".join(kwargs.keys())
   1574     )
   1576 backend_kwargs = {
   1577     "synchronizer": synchronizer,
   1578     "consolidated": consolidated,
   (...)   1583     "zarr_format": zarr_format,
   1584 }
-> 1586 ds = open_dataset(
   1587     filename_or_obj=store,
   1588     group=group,
   1589     decode_cf=decode_cf,
   1590     mask_and_scale=mask_and_scale,
   1591     decode_times=decode_times,
   1592     concat_characters=concat_characters,
   1593     decode_coords=decode_coords,
   1594     engine="zarr",
   1595     chunks=chunks,
   1596     drop_variables=drop_variables,
   1597     create_default_indexes=create_default_indexes,
   1598     chunked_array_type=chunked_array_type,
   1599     from_array_kwargs=from_array_kwargs,
   1600     backend_kwargs=backend_kwargs,
   1601     decode_timedelta=decode_timedelta,
   1602     use_cftime=use_cftime,
   1603     zarr_version=zarr_version,
   1604     use_zarr_fill_value_as_mask=use_zarr_fill_value_as_mask,
   1605 )
   1606 return ds

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/api.py:602, in open_dataset()
    595 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
    596 backend_ds = backend.open_dataset(
    597     filename_or_obj,
    598     drop_variables=drop_variables,
    599     **decoders,
    600     **kwargs,
    601 )
--> 602 ds = _dataset_from_backend_dataset(
    603     backend_ds,
    604     filename_or_obj,
    605     engine,
    606     chunks,
    607     cache,
    608     overwrite_encoded_chunks,
    609     inline_array,
    610     chunked_array_type,
    611     from_array_kwargs,
    612     drop_variables=drop_variables,
    613     create_default_indexes=create_default_indexes,
    614     **decoders,
    615     **kwargs,
    616 )
    617 return ds

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/api.py:292, in _dataset_from_backend_dataset()
    289 _protect_dataset_variables_inplace(backend_ds, cache)
    291 if create_default_indexes:
--> 292     ds = _maybe_create_default_indexes(backend_ds)
    293 else:
    294     ds = backend_ds

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/api.py:268, in _maybe_create_default_indexes()
    262 def _maybe_create_default_indexes(ds):
    263     to_index = {
    264         name: coord.variable
    265         for name, coord in ds.coords.items()
    266         if coord.dims == (name,) and name not in ds.xindexes
    267     }
--> 268     return ds.assign_coords(Coordinates(to_index))

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/coordinates.py:315, in __init__()
    313 var = as_variable(data, name=name, auto_convert=False)
    314 if var.dims == (name,) and indexes is None:
--> 315     index, index_vars = create_default_index_implicit(var, list(coords))
    316     default_indexes.update(dict.fromkeys(index_vars, index))
    317     variables.update(index_vars)

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexes.py:1638, in create_default_index_implicit()
   1636 else:
   1637     dim_var = {name: dim_variable}
-> 1638     index = PandasIndex.from_variables(dim_var, options={})
   1639     index_vars = index.create_variables(dim_var)
   1641 return index, index_vars

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexes.py:720, in from_variables()
    712 dim = var.dims[0]
    714 # TODO: (benbovy - explicit indexes): add __index__ to ExplicitlyIndexesNDArrayMixin?
    715 # this could be eventually used by Variable.to_index() and would remove the need to perform
    716 # the checks below.
    717 
    718 # preserve wrapped pd.Index (if any)
    719 # accessing `.data` can load data from disk, so we only access if needed
--> 720 data = var._data if isinstance(var._data, PandasIndexingAdapter) else var.data  # type: ignore[redundant-expr]
    721 # multi-index level variable: get level index
    722 if isinstance(var._data, PandasMultiIndexingAdapter):

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/variable.py:456, in data()
    454     duck_array = self._data.array
    455 elif isinstance(self._data, indexing.ExplicitlyIndexed):
--> 456     duck_array = self._data.get_duck_array()
    457 elif is_duck_array(self._data):
    458     duck_array = self._data

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py:897, in get_duck_array()
    896 def get_duck_array(self):
--> 897     return self.array.get_duck_array()

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py:737, in get_duck_array()
    734 from xarray.backends.common import BackendArray
    736 if isinstance(self.array, BackendArray):
--> 737     array = self.array[self.key]
    738 else:
    739     array = apply_indexer(self.array, self.key)

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py:262, in __getitem__()
    260 elif isinstance(key, indexing.OuterIndexer):
    261     method = self._oindex
--> 262 return indexing.explicit_indexing_adapter(
    263     key, array.shape, indexing.IndexingSupport.VECTORIZED, method
    264 )

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/core/indexing.py:1129, in explicit_indexing_adapter()
   1107 """Support explicit indexing by delegating to a raw indexing method.
   1108 
   1109 Outer and/or vectorized indexers are supported by indexing a second time
   (...)   1126 Indexing result, in the form of a duck numpy-array.
   1127 """
   1128 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
-> 1129 result = raw_indexing_method(raw_key.tuple)
   1130 if numpy_indices.tuple:
   1131     # index the loaded duck array
   1132     indexable = as_indexable(result)

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/xarray/backends/zarr.py:225, in _getitem()
    224 def _getitem(self, key):
--> 225     return self._array[key]

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py:800, in __getitem__()
    798     result = self.get_orthogonal_selection(pure_selection, fields=fields)
    799 else:
--> 800     result = self.get_basic_selection(pure_selection, fields=fields)
    801 return result

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py:926, in get_basic_selection()
    924     return self._get_basic_selection_zd(selection=selection, out=out, fields=fields)
    925 else:
--> 926     return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py:968, in _get_basic_selection_nd()
    962 def _get_basic_selection_nd(self, selection, out=None, fields=None):
    963     # implementation of basic selection for array with at least one dimension
    964 
    965     # setup indexer
    966     indexer = BasicIndexer(selection, self)
--> 968     return self._get_selection(indexer=indexer, out=out, fields=fields)

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py:1343, in _get_selection()
   1340 if math.prod(out_shape) > 0:
   1341     # allow storage to get multiple items at once
   1342     lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
-> 1343     self._chunk_getitems(
   1344         lchunk_coords,
   1345         lchunk_selection,
   1346         out,
   1347         lout_selection,
   1348         drop_axes=indexer.drop_axes,
   1349         fields=fields,
   1350     )
   1351 if out.shape:
   1352     return out

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/core.py:2179, in _chunk_getitems()
   2177     if not isinstance(self._meta_array, np.ndarray):
   2178         contexts = ConstantMap(ckeys, constant=Context(meta_array=self._meta_array))
-> 2179     cdatas = self.chunk_store.getitems(ckeys, contexts=contexts)
   2181 for ckey, chunk_select, out_select in zip(ckeys, lchunk_selection, lout_selection):
   2182     if ckey in cdatas:

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/zarr/storage.py:1435, in getitems()
   1432     continue
   1433 elif isinstance(v, Exception):
   1434     # Raise any other exception
-> 1435     raise v
   1436 else:
   1437     # The function calling this method may not recognize the transformed
   1438     # keys, so we send the values returned by self.map.getitems back into
   1439     # the original key space.
   1440     results[keys_transformed[k]] = v

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py:244, in _run_coro()
    242 async def _run_coro(coro, i):
    243     try:
--> 244         return await asyncio.wait_for(coro, timeout=timeout), i
    245     except Exception as e:
    246         if not return_exceptions:

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/asyncio/tasks.py:520, in wait_for()
    517         raise TimeoutError from exc
    519 async with timeouts.timeout(timeout):
--> 520     return await fut

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/implementations/http.py:246, in _cat_file()
    244 session = await self.set_session()
    245 async with session.get(self.encode_url(url), **kw) as r:
--> 246     out = await r.read()
    247     self._raise_not_found_for_status(r, url)
    248 return out

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/client_reqrep.py:693, in read()
    691 if self._body is None:
    692     try:
--> 693         self._body = await self.content.read()
    694         for trace in self._traces:
    695             await trace.send_response_chunk_received(
    696                 self.method, self.url, self._body
    697             )

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py:426, in read()
    424 blocks = []
    425 while True:
--> 426     block = await self.readany()
    427     if not block:
    428         break

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py:448, in readany()
    444 # TODO: should be `if` instead of `while`
    445 # because waiter maybe triggered on chunk end,
    446 # without feeding any data
    447 while not self._buffer and not self._eof:
--> 448     await self._wait("readany")
    450 return self._read_nowait(-1)

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/aiohttp/streams.py:355, in _wait()
    353 try:
    354     with self._timer:
--> 355         await waiter
    356 finally:
    357     self._waiter = None

ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data to satisfy content length header.'>
#calculate global means
def get_lat_name(ds):
    for lat_name in ['lat', 'latitude']:
        if lat_name in ds.coords:
            return lat_name
    raise RuntimeError("Couldn't find a latitude coordinate")

def global_mean(ds):
    lat = ds[get_lat_name(ds)]
    weight = np.cos(np.deg2rad(lat))
    weight /= weight.mean()
    other_dims = set(ds.dims) - {'time'}
    return (ds * weight).mean(other_dims)
#calculate global means
def get_lat_name(ds):
    for lat_name in ['lat', 'latitude']:
        if lat_name in ds.coords:
            return lat_name
    raise RuntimeError("Couldn't find a latitude coordinate")

def global_mean(ds):
    lat = ds[get_lat_name(ds)]
    weight = np.cos(np.deg2rad(lat))
    weight /= weight.mean()
    other_dims = set(ds.dims) - {'time'}
    return (ds * weight).mean(other_dims)

GMST computation

expt_da = xr.DataArray(expts, dims='experiment_id', name='experiment_id',
                       coords={'experiment_id': expts})

dsets_aligned = {}

for k, v in tqdm(dsets_.items()):
    expt_dsets = v.values()
    if any([d is None for d in expt_dsets]):
        print(f"Missing experiment for {k}")
        continue

    for ds in expt_dsets:
        ds.coords['year'] = ds.time.dt.year

    # workaround for
    # https://github.com/pydata/xarray/issues/2237#issuecomment-620961663
    dsets_ann_mean = [v[expt].pipe(global_mean).swap_dims({'time': 'year'})
                             .drop_vars('time').coarsen(year=12).mean()
                      for expt in expts]

    # align everything with the 4xCO2 experiment
    dsets_aligned[k] = xr.concat(dsets_ann_mean, join='outer',dim=expt_da)
%%time
with progress.ProgressBar():
    dsets_aligned_ = dask.compute(dsets_aligned)[0]
source_ids = list(dsets_aligned_.keys())
source_da = xr.DataArray(source_ids, dims='source_id', name='source_id',
                         coords={'source_id': source_ids})

big_ds = xr.concat([ds.reset_coords(drop=True)
                    for ds in dsets_aligned_.values()],
                    dim=source_da)

big_ds
# Compute annual mean temperatures anomalies of observational data
obs_gmsta = obs_ds.resample(time='YS').mean(dim='time')
# obs_gmsta

Compute anomlaies and plot

  • We will compute the temperature anomalies w.r.t 1960-1990 baseline period

  • Convert xarray datasets to pandas dataframes

  • Use Seaborn to plot GMSTA

df_all = big_ds.to_dataframe().reset_index()
df_all.head()
# Define the baseline period
baseline_df = df_all[(df_all["year"] >= 1960) & (df_all["year"] <= 1990)]

# Compute the baseline mean
baseline_mean = baseline_df["tas"].mean()

# Compute anomalies
df_all["tas_anomaly"] = df_all["tas"] - baseline_mean
df_all
obs_df = obs_gmsta.to_dataframe(name='tas_anomaly').reset_index()
# Convert 'time' to 'year' (keeping only the year)
obs_df['year'] = obs_df['time'].dt.year

# Drop the original 'time' column since we extracted 'year'
obs_df = obs_df[['year', 'tas_anomaly']]
obs_df

Almost there! Let us now use seaborn to plot all the anomalies

g = sns.relplot(data=df_all, x="year", y="tas_anomaly",
                hue='experiment_id', kind="line", errorbar="sd", aspect=2, palette="Set2")  # Adjust the color palette)

# Get the current axis from the FacetGrid
ax = g.ax

# Overlay the observational data in red
sns.lineplot(data=obs_df, x="year", y="tas_anomaly",color="red", 
             linestyle="dashed", linewidth=2,label="Observations", ax=ax)

# Adjust the legend to include observations
ax.legend(title="Experiment ID + Observations")

# Show the plot
plt.show()

Summary

In this notebook, we used surface air temperature data from several CMIP6 models for the ‘historical’, ‘SSP245’ and ‘SSP370’ runs to compute Global Mean Surface Temperature Anomaly (GMSTA) relative to the 1960-1990 baseline period and compare it with anomalies computed from the HadCRUT monthly surface temperature dataset. We used a modified intake-ESM catalog and pelicanFS to ‘stream/download’ temperature data from two different OSDF origins. The CMIP6 model data was streamed from the AWS OpenData origin in the us-west-2 region and the observational data was streamed from NCAR’s OSDF origin.

Resources and references

  1. Original notebook in the Pangeo Gallery by Henri Drake and Ryan Abernathey

  2. CMIP6 cookbook by Ryan Abernathey, Henri Drake, Robert Ford and Max Grover

  3. Coupled Model Intercomparison Project 6 was accessed from https://registry.opendata.aws/cmip6 using a modified intake-ESM catalog hosted on NCAR’s GDEX

  4. We thank the UK Met Office Hadley Center for providing the observational data