Load CMIP6 Data with Intake-ESM
Overview
Intake-ESM is an experimental new package that aims to provide a higher-level interface to searching and loading Earth System Model data archives, such as CMIP6. The package is under very active development, and features may be unstable. Please report any issues or suggestions on GitHub.
Prerequisites
Concepts |
Importance |
Notes |
---|---|---|
Necessary |
||
Helpful |
Familiarity with metadata structure |
Time to learn: 5 minutes
Loading Data
Intake ESM works by parsing an ESM Collection Spec and converting it to an Intake catalog. The collection spec is stored in a .json
file. Here we open it using Intake.
cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):
unique | |
---|---|
activity_id | 18 |
institution_id | 36 |
source_id | 88 |
experiment_id | 170 |
member_id | 657 |
table_id | 37 |
variable_id | 700 |
grid_label | 10 |
zstore | 514818 |
dcpp_init_year | 60 |
version | 736 |
derived_variable_id | 0 |
We can now use Intake methods to search the collection, and, if desired, export a Pandas dataframe.
cat = col.search(experiment_id=['historical', 'ssp585'], table_id='Oyr', variable_id='o2',
grid_label='gn')
cat.df
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
/srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/cat.py:283: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
.applymap(type)
activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | zstore | dcpp_init_year | version | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | CMIP | IPSL | IPSL-CM6A-LR | historical | r8i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
1 | CMIP | IPSL | IPSL-CM6A-LR | historical | r5i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
2 | CMIP | IPSL | IPSL-CM6A-LR | historical | r26i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
3 | CMIP | IPSL | IPSL-CM6A-LR | historical | r2i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
4 | CMIP | IPSL | IPSL-CM6A-LR | historical | r6i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
168 | CMIP | CSIRO | ACCESS-ESM1-5 | historical | r11i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/hist... | NaN | 20200803 |
169 | CMIP | EC-Earth-Consortium | EC-Earth3-CC | historical | r1i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... | NaN | 20210113 |
170 | ScenarioMIP | EC-Earth-Consortium | EC-Earth3-CC | ssp585 | r1i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consorti... | NaN | 20210113 |
171 | CMIP | CMCC | CMCC-ESM2 | historical | r1i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/CMCC/CMCC-ESM2/historica... | NaN | 20210114 |
172 | ScenarioMIP | CMCC | CMCC-ESM2 | ssp585 | r1i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/ScenarioMIP/CMCC/CMCC-ESM2/ss... | NaN | 20210126 |
173 rows × 11 columns
Intake knows how to automatically open the Datasets using Xarray. Furthermore, Intake-ESM contains special logic to concatenate and merge the individual results of our query into larger, more high-level aggregated Xarray Datasets.
dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})
list(dset_dict.keys())
/tmp/ipykernel_369/2728409572.py:1: DeprecationWarning: cdf_kwargs and zarr_kwargs are deprecated and will be removed in a future version. Please use xarray_open_kwargs instead.
dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})
--> The keys in the returned dictionary of datasets are constructed as follows:
'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
['CMIP.IPSL.IPSL-CM5A2-INCA.historical.Oyr.gn',
'ScenarioMIP.NCC.NorESM2-LM.ssp585.Oyr.gn',
'ScenarioMIP.MRI.MRI-ESM2-0.ssp585.Oyr.gn',
'CMIP.MRI.MRI-ESM2-0.historical.Oyr.gn',
'ScenarioMIP.NCC.NorESM2-MM.ssp585.Oyr.gn',
'CMIP.EC-Earth-Consortium.EC-Earth3-CC.historical.Oyr.gn',
'ScenarioMIP.CMCC.CMCC-ESM2.ssp585.Oyr.gn',
'ScenarioMIP.MIROC.MIROC-ES2L.ssp585.Oyr.gn',
'ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Oyr.gn',
'CMIP.CMCC.CMCC-ESM2.historical.Oyr.gn',
'ScenarioMIP.EC-Earth-Consortium.EC-Earth3-CC.ssp585.Oyr.gn',
'ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp585.Oyr.gn',
'CMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.historical.Oyr.gn',
'ScenarioMIP.CCCma.CanESM5-CanOE.ssp585.Oyr.gn',
'CMIP.CCCma.CanESM5-CanOE.historical.Oyr.gn',
'CMIP.NCC.NorESM2-LM.historical.Oyr.gn',
'CMIP.NCC.NorESM2-MM.historical.Oyr.gn',
'ScenarioMIP.NCAR.CESM2.ssp585.Oyr.gn',
'ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp585.Oyr.gn',
'CMIP.MIROC.MIROC-ES2L.historical.Oyr.gn',
'ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585.Oyr.gn',
'CMIP.MPI-M.MPI-ESM1-2-LR.historical.Oyr.gn',
'CMIP.CSIRO.ACCESS-ESM1-5.historical.Oyr.gn',
'ScenarioMIP.CCCma.CanESM5.ssp585.Oyr.gn',
'CMIP.MPI-M.MPI-ESM1-2-HR.historical.Oyr.gn',
'CMIP.IPSL.IPSL-CM6A-LR.historical.Oyr.gn',
'CMIP.CCCma.CanESM5.historical.Oyr.gn']
ds = dset_dict['CMIP.CCCma.CanESM5.historical.Oyr.gn']
ds
<xarray.Dataset> Dimensions: (i: 360, j: 291, lev: 45, bnds: 2, member_id: 35, dcpp_init_year: 1, time: 165, vertices: 4) Coordinates: * i (i) int32 0 1 2 3 4 5 6 ... 353 354 355 356 357 358 359 * j (j) int32 0 1 2 3 4 5 6 ... 284 285 286 287 288 289 290 latitude (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray> * lev (lev) float64 3.047 9.454 16.36 ... 5.375e+03 5.625e+03 lev_bnds (lev, bnds) float64 dask.array<chunksize=(45, 2), meta=np.ndarray> longitude (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray> * time (time) object 1850-07-02 12:00:00 ... 2014-07-02 12:0... time_bnds (time, bnds) object dask.array<chunksize=(165, 2), meta=np.ndarray> vertices_latitude (j, i, vertices) float64 dask.array<chunksize=(291, 360, 4), meta=np.ndarray> vertices_longitude (j, i, vertices) float64 dask.array<chunksize=(291, 360, 4), meta=np.ndarray> * member_id (member_id) object 'r10i1p1f1' ... 'r9i1p2f1' * dcpp_init_year (dcpp_init_year) float64 nan Dimensions without coordinates: bnds, vertices Data variables: o2 (member_id, dcpp_init_year, time, lev, j, i) float32 dask.array<chunksize=(1, 1, 12, 45, 291, 360), meta=np.ndarray> Attributes: (12/52) Conventions: CF-1.7 CMIP-6.2 YMDH_branch_time_in_child: 1850:01:01:00 activity_id: CMIP branch_method: Spin-up documentation branch_time_in_child: 0.0 cmor_version: 3.4.0 ... ... intake_esm_attrs:table_id: Oyr intake_esm_attrs:variable_id: o2 intake_esm_attrs:grid_label: gn intake_esm_attrs:version: 20190429 intake_esm_attrs:_data_format_: zarr intake_esm_dataset_key: CMIP.CCCma.CanESM5.historical.Oyr.gn
Summary
In this notebook, we used Intake-ESM to open an Xarray Dataset for one particular model and experiment.
Resources and references
Original notebook in the Pangeo Gallery by Henri Drake and Ryan Abernathey