Skip to article frontmatterSkip to article content

Google Cloud CMIP6 Public Data: Basic Python Example


Overview

This notebooks shows how to query the Google Cloud CMIP6 catalog and load the data using Python.

Prerequisites

ConceptsImportanceNotes
Intro to XarrayNecessary
Understanding of NetCDFHelpfulFamiliarity with metadata structure
  • Time to learn: 10 minutes

Imports

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import zarr
import fsspec
import nc_time_axis

%matplotlib inline
plt.rcParams['figure.figsize'] = 12, 6

Browse Catalog

The data catatalog is stored as a CSV file. Here we read it with Pandas.

df = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv')
df.head()
Loading...

The columns of the dataframe correspond to the CMI6 controlled vocabulary.

Here we filter the data to find monthly surface air temperature for historical experiments.

df_ta = df.query("activity_id=='CMIP' & table_id == 'Amon' & variable_id == 'tas' & experiment_id == 'historical'")
df_ta
Loading...

Now we do further filtering to find just the models from NCAR.

df_ta_ncar = df_ta.query('institution_id == "NCAR"')
df_ta_ncar
Loading...

Load Data

Now we will load a single store using fsspec, zarr, and xarray.

# get the path to a specific zarr store (the first one from the dataframe above)
zstore = df_ta_ncar.zstore.values[-1]
print(zstore)

# create a mutable-mapping-style interface to the store
mapper = fsspec.get_mapper(zstore)

# open it using xarray and zarr
ds = xr.open_zarr(mapper, consolidated=True)
ds
gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226/
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 6
      3 print(zstore)
      5 # create a mutable-mapping-style interface to the store
----> 6 mapper = fsspec.get_mapper(zstore)
      8 # open it using xarray and zarr
      9 ds = xr.open_zarr(mapper, consolidated=True)

File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/mapping.py:249, in get_mapper(url, check, create, missing_exceptions, alternate_root, **kwargs)
    218 """Create key-value interface for given URL and options
    219 
    220 The URL will be of the form "protocol://location" and point to the root
   (...)    246 ``FSMap`` instance, the dict-like key-value store.
    247 """
    248 # Removing protocol here - could defer to each open() on the backend
--> 249 fs, urlpath = url_to_fs(url, **kwargs)
    250 root = alternate_root if alternate_root is not None else urlpath
    251 return FSMap(root, fs, check, create, missing_exceptions=missing_exceptions)

File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/core.py:415, in url_to_fs(url, **kwargs)
    413     inkwargs["fo"] = urls
    414 urlpath, protocol, _ = chain[0]
--> 415 fs = filesystem(protocol, **inkwargs)
    416 return fs, urlpath

File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/registry.py:322, in filesystem(protocol, **storage_options)
    315     warnings.warn(
    316         "The 'arrow_hdfs' protocol has been deprecated and will be "
    317         "removed in the future. Specify it as 'hdfs'.",
    318         DeprecationWarning,
    319     )
    321 cls = get_filesystem_class(protocol)
--> 322 return cls(**storage_options)

File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/spec.py:81, in _Cached.__call__(cls, *args, **kwargs)
     79     return cls._cache[token]
     80 else:
---> 81     obj = super().__call__(*args, **kwargs)
     82     # Setting _fs_token here causes some static linters to complain.
     83     obj._fs_token_ = token

File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/gcsfs/core.py:269, in GCSFileSystem.__init__(self, project, access, token, block_size, consistency, cache_timeout, secure_serialize, check_connection, requests_timeout, requester_pays, asynchronous, loop, callback_timeout, **kwargs)
    267 self.callback_timeout = callback_timeout
    268 if not asynchronous:
--> 269     self._session = sync(
    270         self.loop, get_client, callback_timeout=self.callback_timeout
    271     )
    272     weakref.finalize(self, sync, self.loop, self.session.close)
    273 else:

File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs)
    101     raise FSTimeoutError from return_result
    102 elif isinstance(return_result, BaseException):
--> 103     raise return_result
    104 else:
    105     return return_result

File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout)
     54     coro = asyncio.wait_for(coro, timeout=timeout)
     55 try:
---> 56     result[0] = await coro
     57 except Exception as ex:
     58     result[0] = ex

File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/implementations/http.py:33, in get_client(**kwargs)
     32 async def get_client(**kwargs):
---> 33     return aiohttp.ClientSession(**kwargs)

TypeError: ClientSession.__init__() got an unexpected keyword argument 'callback_timeout'

Plot the Data

Plot a map from a specific date:

ds.tas.sel(time='1950-01').squeeze().plot()

The global mean of a lat-lon field needs to be weighted by the area of each grid cell, which is proportional to the cosine of its latitude.

def global_mean(field):
    weights = np.cos(np.deg2rad(field.lat))
    return field.weighted(weights).mean(dim=['lat', 'lon'])

We can pass all of the temperature data through this function:

ta_timeseries = global_mean(ds.tas)
ta_timeseries

By default the data are loaded lazily, as Dask arrays. Here we trigger computation explicitly.

%time ta_timeseries.load()
ta_timeseries.plot(label='monthly')
ta_timeseries.rolling(time=12).mean().plot(label='12 month rolling mean', color='k')
plt.legend()
plt.grid()
plt.title('Global Mean Surface Air Temperature')

Summary

In this notebook, we opened a CESM2 dataset with fsspec and zarr. We calculated and plotted global average surface air temperature.

What’s next?

We will open a dataset with ESGF and OPenDAP.

Resources and references