Overview¶
This notebooks shows how to query the Google Cloud CMIP6 catalog and load the data using Python.
Prerequisites¶
Concepts | Importance | Notes |
---|---|---|
Intro to Xarray | Necessary | |
Understanding of NetCDF | Helpful | Familiarity with metadata structure |
- Time to learn: 10 minutes
Imports¶
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import zarr
import fsspec
import nc_time_axis
%matplotlib inline
plt.rcParams['figure.figsize'] = 12, 6
Browse Catalog¶
The data catatalog is stored as a CSV file. Here we read it with Pandas.
df = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv')
df.head()
The columns of the dataframe correspond to the CMI6 controlled vocabulary.
Here we filter the data to find monthly surface air temperature for historical experiments.
df_ta = df.query("activity_id=='CMIP' & table_id == 'Amon' & variable_id == 'tas' & experiment_id == 'historical'")
df_ta
Now we do further filtering to find just the models from NCAR.
df_ta_ncar = df_ta.query('institution_id == "NCAR"')
df_ta_ncar
Load Data¶
Now we will load a single store using fsspec
, zarr
, and xarray
.
# get the path to a specific zarr store (the first one from the dataframe above)
zstore = df_ta_ncar.zstore.values[-1]
print(zstore)
# create a mutable-mapping-style interface to the store
mapper = fsspec.get_mapper(zstore)
# open it using xarray and zarr
ds = xr.open_zarr(mapper, consolidated=True)
ds
gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226/
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[5], line 6
3 print(zstore)
5 # create a mutable-mapping-style interface to the store
----> 6 mapper = fsspec.get_mapper(zstore)
8 # open it using xarray and zarr
9 ds = xr.open_zarr(mapper, consolidated=True)
File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/mapping.py:249, in get_mapper(url, check, create, missing_exceptions, alternate_root, **kwargs)
218 """Create key-value interface for given URL and options
219
220 The URL will be of the form "protocol://location" and point to the root
(...) 246 ``FSMap`` instance, the dict-like key-value store.
247 """
248 # Removing protocol here - could defer to each open() on the backend
--> 249 fs, urlpath = url_to_fs(url, **kwargs)
250 root = alternate_root if alternate_root is not None else urlpath
251 return FSMap(root, fs, check, create, missing_exceptions=missing_exceptions)
File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/core.py:415, in url_to_fs(url, **kwargs)
413 inkwargs["fo"] = urls
414 urlpath, protocol, _ = chain[0]
--> 415 fs = filesystem(protocol, **inkwargs)
416 return fs, urlpath
File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/registry.py:322, in filesystem(protocol, **storage_options)
315 warnings.warn(
316 "The 'arrow_hdfs' protocol has been deprecated and will be "
317 "removed in the future. Specify it as 'hdfs'.",
318 DeprecationWarning,
319 )
321 cls = get_filesystem_class(protocol)
--> 322 return cls(**storage_options)
File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/spec.py:81, in _Cached.__call__(cls, *args, **kwargs)
79 return cls._cache[token]
80 else:
---> 81 obj = super().__call__(*args, **kwargs)
82 # Setting _fs_token here causes some static linters to complain.
83 obj._fs_token_ = token
File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/gcsfs/core.py:269, in GCSFileSystem.__init__(self, project, access, token, block_size, consistency, cache_timeout, secure_serialize, check_connection, requests_timeout, requester_pays, asynchronous, loop, callback_timeout, **kwargs)
267 self.callback_timeout = callback_timeout
268 if not asynchronous:
--> 269 self._session = sync(
270 self.loop, get_client, callback_timeout=self.callback_timeout
271 )
272 weakref.finalize(self, sync, self.loop, self.session.close)
273 else:
File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs)
101 raise FSTimeoutError from return_result
102 elif isinstance(return_result, BaseException):
--> 103 raise return_result
104 else:
105 return return_result
File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout)
54 coro = asyncio.wait_for(coro, timeout=timeout)
55 try:
---> 56 result[0] = await coro
57 except Exception as ex:
58 result[0] = ex
File ~/micromamba/envs/cmip6-cookbook-dev/lib/python3.11/site-packages/fsspec/implementations/http.py:33, in get_client(**kwargs)
32 async def get_client(**kwargs):
---> 33 return aiohttp.ClientSession(**kwargs)
TypeError: ClientSession.__init__() got an unexpected keyword argument 'callback_timeout'
Plot the Data¶
Plot a map from a specific date:
ds.tas.sel(time='1950-01').squeeze().plot()
The global mean of a lat-lon field needs to be weighted by the area of each grid cell, which is proportional to the cosine of its latitude.
def global_mean(field):
weights = np.cos(np.deg2rad(field.lat))
return field.weighted(weights).mean(dim=['lat', 'lon'])
We can pass all of the temperature data through this function:
ta_timeseries = global_mean(ds.tas)
ta_timeseries
By default the data are loaded lazily, as Dask arrays. Here we trigger computation explicitly.
%time ta_timeseries.load()
ta_timeseries.plot(label='monthly')
ta_timeseries.rolling(time=12).mean().plot(label='12 month rolling mean', color='k')
plt.legend()
plt.grid()
plt.title('Global Mean Surface Air Temperature')
Summary¶
In this notebook, we opened a CESM2 dataset with fsspec
and zarr
. We calculated and plotted global average surface air temperature.
What’s next?¶
We will open a dataset with ESGF and OPenDAP.
Resources and references¶
- Original notebook in the Pangeo Gallery by Henri Drake and Ryan Abernathey