Intake logo CMIP6 image

Load CMIP6 Data with Intake-ESM


Overview

Intake-ESM is an experimental new package that aims to provide a higher-level interface to searching and loading Earth System Model data archives, such as CMIP6. The package is under very active development, and features may be unstable. Please report any issues or suggestions on GitHub.

Prerequisites

Concepts

Importance

Notes

Intro to Xarray

Necessary

Understanding of NetCDF

Helpful

Familiarity with metadata structure

  • Time to learn: 5 minutes


Imports

import xarray as xr
xr.set_options(display_style='html')
import intake
%matplotlib inline

Loading Data

Intake ESM works by parsing an ESM Collection Spec and converting it to an Intake catalog. The collection spec is stored in a .json file. Here we open it using Intake.

cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col

pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):

unique
activity_id 18
institution_id 36
source_id 88
experiment_id 170
member_id 657
table_id 37
variable_id 700
grid_label 10
zstore 514818
dcpp_init_year 60
version 736
derived_variable_id 0

We can now use Intake methods to search the collection, and, if desired, export a Pandas dataframe.

cat = col.search(experiment_id=['historical', 'ssp585'], table_id='Oyr', variable_id='o2',
                 grid_label='gn')
cat.df
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
0 CMIP IPSL IPSL-CM6A-LR historical r8i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
1 CMIP IPSL IPSL-CM6A-LR historical r5i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
2 CMIP IPSL IPSL-CM6A-LR historical r26i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
3 CMIP IPSL IPSL-CM6A-LR historical r2i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
4 CMIP IPSL IPSL-CM6A-LR historical r6i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... NaN 20180803
... ... ... ... ... ... ... ... ... ... ... ...
168 CMIP CSIRO ACCESS-ESM1-5 historical r11i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/hist... NaN 20200803
169 CMIP EC-Earth-Consortium EC-Earth3-CC historical r1i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... NaN 20210113
170 ScenarioMIP EC-Earth-Consortium EC-Earth3-CC ssp585 r1i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consorti... NaN 20210113
171 CMIP CMCC CMCC-ESM2 historical r1i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/CMIP/CMCC/CMCC-ESM2/historica... NaN 20210114
172 ScenarioMIP CMCC CMCC-ESM2 ssp585 r1i1p1f1 Oyr o2 gn gs://cmip6/CMIP6/ScenarioMIP/CMCC/CMCC-ESM2/ss... NaN 20210126

173 rows × 11 columns

Intake knows how to automatically open the Datasets using Xarray. Furthermore, Intake-ESM contains special logic to concatenate and merge the individual results of our query into larger, more high-level aggregated Xarray Datasets.

dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})
list(dset_dict.keys())
--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
100.00% [27/27 01:16<00:00]
['CMIP.MRI.MRI-ESM2-0.historical.Oyr.gn',
 'CMIP.EC-Earth-Consortium.EC-Earth3-CC.historical.Oyr.gn',
 'ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp585.Oyr.gn',
 'ScenarioMIP.NCAR.CESM2.ssp585.Oyr.gn',
 'CMIP.CCCma.CanESM5.historical.Oyr.gn',
 'CMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.historical.Oyr.gn',
 'ScenarioMIP.CCCma.CanESM5-CanOE.ssp585.Oyr.gn',
 'CMIP.IPSL.IPSL-CM5A2-INCA.historical.Oyr.gn',
 'CMIP.NCC.NorESM2-LM.historical.Oyr.gn',
 'ScenarioMIP.NCC.NorESM2-LM.ssp585.Oyr.gn',
 'ScenarioMIP.NCC.NorESM2-MM.ssp585.Oyr.gn',
 'CMIP.NCC.NorESM2-MM.historical.Oyr.gn',
 'CMIP.MIROC.MIROC-ES2L.historical.Oyr.gn',
 'ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585.Oyr.gn',
 'CMIP.CMCC.CMCC-ESM2.historical.Oyr.gn',
 'ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp585.Oyr.gn',
 'ScenarioMIP.MRI.MRI-ESM2-0.ssp585.Oyr.gn',
 'ScenarioMIP.CMCC.CMCC-ESM2.ssp585.Oyr.gn',
 'ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Oyr.gn',
 'CMIP.CSIRO.ACCESS-ESM1-5.historical.Oyr.gn',
 'CMIP.MPI-M.MPI-ESM1-2-LR.historical.Oyr.gn',
 'ScenarioMIP.MIROC.MIROC-ES2L.ssp585.Oyr.gn',
 'ScenarioMIP.EC-Earth-Consortium.EC-Earth3-CC.ssp585.Oyr.gn',
 'CMIP.CCCma.CanESM5-CanOE.historical.Oyr.gn',
 'ScenarioMIP.CCCma.CanESM5.ssp585.Oyr.gn',
 'CMIP.IPSL.IPSL-CM6A-LR.historical.Oyr.gn',
 'CMIP.MPI-M.MPI-ESM1-2-HR.historical.Oyr.gn']
ds = dset_dict['CMIP.CCCma.CanESM5.historical.Oyr.gn']
ds
<xarray.Dataset> Size: 109GB
Dimensions:             (member_id: 35, dcpp_init_year: 1, time: 165, lev: 45,
                         j: 291, i: 360, bnds: 2, vertices: 4)
Coordinates:
  * i                   (i) int32 1kB 0 1 2 3 4 5 6 ... 354 355 356 357 358 359
  * j                   (j) int32 1kB 0 1 2 3 4 5 6 ... 285 286 287 288 289 290
    latitude            (j, i) float64 838kB dask.array<chunksize=(291, 360), meta=np.ndarray>
  * lev                 (lev) float64 360B 3.047 9.454 ... 5.375e+03 5.625e+03
    lev_bnds            (lev, bnds) float64 720B dask.array<chunksize=(45, 2), meta=np.ndarray>
    longitude           (j, i) float64 838kB dask.array<chunksize=(291, 360), meta=np.ndarray>
  * time                (time) object 1kB 1850-07-02 12:00:00 ... 2014-07-02 ...
    time_bnds           (time, bnds) object 3kB dask.array<chunksize=(165, 2), meta=np.ndarray>
    vertices_latitude   (j, i, vertices) float64 3MB dask.array<chunksize=(291, 360, 4), meta=np.ndarray>
    vertices_longitude  (j, i, vertices) float64 3MB dask.array<chunksize=(291, 360, 4), meta=np.ndarray>
  * member_id           (member_id) object 280B 'r10i1p1f1' ... 'r9i1p2f1'
  * dcpp_init_year      (dcpp_init_year) float64 8B nan
Dimensions without coordinates: bnds, vertices
Data variables:
    o2                  (member_id, dcpp_init_year, time, lev, j, i) float32 109GB dask.array<chunksize=(1, 1, 12, 45, 291, 360), meta=np.ndarray>
Attributes: (12/52)
    Conventions:                      CF-1.7 CMIP-6.2
    YMDH_branch_time_in_child:        1850:01:01:00
    activity_id:                      CMIP
    branch_method:                    Spin-up documentation
    branch_time_in_child:             0.0
    cmor_version:                     3.4.0
    ...                               ...
    intake_esm_attrs:table_id:        Oyr
    intake_esm_attrs:variable_id:     o2
    intake_esm_attrs:grid_label:      gn
    intake_esm_attrs:version:         20190429
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           CMIP.CCCma.CanESM5.historical.Oyr.gn

Summary

In this notebook, we used Intake-ESM to open an Xarray Dataset for one particular model and experiment.

What’s next?

We will see an example of downloading a dataset with fsspec and zarr.

Resources and references