Intake logo CMIP6 image

Load CMIP6 Data with Intake-ESM


Overview

Intake-ESM is an experimental new package that aims to provide a higher-level interface to searching and loading Earth System Model data archives, such as CMIP6. The package is under very active development, and features may be unstable. Please report any issues or suggestions on GitHub.

Prerequisites

Concepts

Importance

Notes

Intro to Xarray

Necessary

Understanding of NetCDF

Helpful

Familiarity with metadata structure

  • Time to learn: 5 minutes


Imports

import xarray as xr
xr.set_options(display_style='html')
import intake
%matplotlib inline

Loading Data

Intake ESM works by parsing an ESM Collection Spec and converting it to an Intake catalog. The collection spec is stored in a .json file. Here we open it using Intake.

cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)

pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):

unique
activity_id 18
institution_id 36
source_id 88
experiment_id 170
member_id 657
table_id 37
variable_id 700
grid_label 10
zstore 514818
dcpp_init_year 60
version 736
derived_variable_id 0

We can now use Intake methods to search the collection, and, if desired, export a Pandas dataframe.

cat = col.search(experiment_id=['historical', 'ssp585'], table_id='Oyr', variable_id='o2',
                 grid_label='gn')
cat.df
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  .applymap(type)
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
0 CMIP IPSL IPSL-CM6A-LR historical r8i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
1 CMIP IPSL IPSL-CM6A-LR historical r5i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
2 CMIP IPSL IPSL-CM6A-LR historical r26i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
3 CMIP IPSL IPSL-CM6A-LR historical r2i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
4 CMIP IPSL IPSL-CM6A-LR historical r6i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
... ... ... ... ... ... ... ... ... ... ... ...
168 CMIP CSIRO ACCESS-ESM1-5 historical r11i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/hist... NaN 20200803
169 CMIP EC-Earth-Consortium EC-Earth3-CC historical r1i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN 20210113
170 ScenarioMIP EC-Earth-Consortium EC-Earth3-CC ssp585 r1i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consorti... NaN 20210113
171 CMIP CMCC CMCC-ESM2 historical r1i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/CMCC/CMCC-ESM2/historica... NaN 20210114
172 ScenarioMIP CMCC CMCC-ESM2 ssp585 r1i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/ScenarioMIP/CMCC/CMCC-ESM2/ss... NaN 20210126

173 rows × 11 columns

Intake knows how to automatically open the Datasets using Xarray. Furthermore, Intake-ESM contains special logic to concatenate and merge the individual results of our query into larger, more high-level aggregated Xarray Datasets.

dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})
list(dset_dict.keys())
/tmp/ipykernel_369/2728409572.py:1: DeprecationWarning: cdf_kwargs and zarr_kwargs are deprecated and will be removed in a future version. Please use xarray_open_kwargs instead.
  dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})
--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
100.00% [27/27 00:36<00:00]
['CMIP.IPSL.IPSL-CM5A2-INCA.historical.Oyr.gn',
 'ScenarioMIP.NCC.NorESM2-LM.ssp585.Oyr.gn',
 'ScenarioMIP.MRI.MRI-ESM2-0.ssp585.Oyr.gn',
 'CMIP.MRI.MRI-ESM2-0.historical.Oyr.gn',
 'ScenarioMIP.NCC.NorESM2-MM.ssp585.Oyr.gn',
 'CMIP.EC-Earth-Consortium.EC-Earth3-CC.historical.Oyr.gn',
 'ScenarioMIP.CMCC.CMCC-ESM2.ssp585.Oyr.gn',
 'ScenarioMIP.MIROC.MIROC-ES2L.ssp585.Oyr.gn',
 'ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Oyr.gn',
 'CMIP.CMCC.CMCC-ESM2.historical.Oyr.gn',
 'ScenarioMIP.EC-Earth-Consortium.EC-Earth3-CC.ssp585.Oyr.gn',
 'ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp585.Oyr.gn',
 'CMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.historical.Oyr.gn',
 'ScenarioMIP.CCCma.CanESM5-CanOE.ssp585.Oyr.gn',
 'CMIP.CCCma.CanESM5-CanOE.historical.Oyr.gn',
 'CMIP.NCC.NorESM2-LM.historical.Oyr.gn',
 'CMIP.NCC.NorESM2-MM.historical.Oyr.gn',
 'ScenarioMIP.NCAR.CESM2.ssp585.Oyr.gn',
 'ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp585.Oyr.gn',
 'CMIP.MIROC.MIROC-ES2L.historical.Oyr.gn',
 'ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585.Oyr.gn',
 'CMIP.MPI-M.MPI-ESM1-2-LR.historical.Oyr.gn',
 'CMIP.CSIRO.ACCESS-ESM1-5.historical.Oyr.gn',
 'ScenarioMIP.CCCma.CanESM5.ssp585.Oyr.gn',
 'CMIP.MPI-M.MPI-ESM1-2-HR.historical.Oyr.gn',
 'CMIP.IPSL.IPSL-CM6A-LR.historical.Oyr.gn',
 'CMIP.CCCma.CanESM5.historical.Oyr.gn']
ds = dset_dict['CMIP.CCCma.CanESM5.historical.Oyr.gn']
ds
<xarray.Dataset>
Dimensions:             (i: 360, j: 291, lev: 45, bnds: 2, member_id: 35,
                         dcpp_init_year: 1, time: 165, vertices: 4)
Coordinates:
  * i                   (i) int32 0 1 2 3 4 5 6 ... 353 354 355 356 357 358 359
  * j                   (j) int32 0 1 2 3 4 5 6 ... 284 285 286 287 288 289 290
    latitude            (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>
  * lev                 (lev) float64 3.047 9.454 16.36 ... 5.375e+03 5.625e+03
    lev_bnds            (lev, bnds) float64 dask.array<chunksize=(45, 2), meta=np.ndarray>
    longitude           (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>
  * time                (time) object 1850-07-02 12:00:00 ... 2014-07-02 12:0...
    time_bnds           (time, bnds) object dask.array<chunksize=(165, 2), meta=np.ndarray>
    vertices_latitude   (j, i, vertices) float64 dask.array<chunksize=(291, 360, 4), meta=np.ndarray>
    vertices_longitude  (j, i, vertices) float64 dask.array<chunksize=(291, 360, 4), meta=np.ndarray>
  * member_id           (member_id) object 'r10i1p1f1' ... 'r9i1p2f1'
  * dcpp_init_year      (dcpp_init_year) float64 nan
Dimensions without coordinates: bnds, vertices
Data variables:
    o2                  (member_id, dcpp_init_year, time, lev, j, i) float32 dask.array<chunksize=(1, 1, 12, 45, 291, 360), meta=np.ndarray>
Attributes: (12/52)
    Conventions:                      CF-1.7 CMIP-6.2
    YMDH_branch_time_in_child:        1850:01:01:00
    activity_id:                      CMIP
    branch_method:                    Spin-up documentation
    branch_time_in_child:             0.0
    cmor_version:                     3.4.0
    ...                               ...
    intake_esm_attrs:table_id:        Oyr
    intake_esm_attrs:variable_id:     o2
    intake_esm_attrs:grid_label:      gn
    intake_esm_attrs:version:         20190429
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           CMIP.CCCma.CanESM5.historical.Oyr.gn

Summary

In this notebook, we used Intake-ESM to open an Xarray Dataset for one particular model and experiment.

What’s next?

We will see an example of downloading a dataset with fsspec and zarr.

Resources and references