Load CMIP6 Data with Intake-ESM
Overview
Intake-ESM is an experimental new package that aims to provide a higher-level interface to searching and loading Earth System Model data archives, such as CMIP6. The package is under very active development, and features may be unstable. Please report any issues or suggestions on GitHub.
Prerequisites
Concepts |
Importance |
Notes |
---|---|---|
Necessary |
||
Helpful |
Familiarity with metadata structure |
Time to learn: 5 minutes
Imports
import xarray as xr
xr.set_options(display_style='html')
import intake
%matplotlib inline
Loading Data
Intake ESM works by parsing an ESM Collection Spec and converting it to an Intake catalog. The collection spec is stored in a .json
file. Here we open it using Intake.
cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col
pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):
unique | |
---|---|
activity_id | 18 |
institution_id | 36 |
source_id | 88 |
experiment_id | 170 |
member_id | 657 |
table_id | 37 |
variable_id | 700 |
grid_label | 10 |
zstore | 514818 |
dcpp_init_year | 60 |
version | 736 |
derived_variable_id | 0 |
We can now use Intake methods to search the collection, and, if desired, export a Pandas dataframe.
cat = col.search(experiment_id=['historical', 'ssp585'], table_id='Oyr', variable_id='o2',
grid_label='gn')
cat.df
activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | zstore | dcpp_init_year | version | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | CMIP | IPSL | IPSL-CM6A-LR | historical | r8i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
1 | CMIP | IPSL | IPSL-CM6A-LR | historical | r5i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
2 | CMIP | IPSL | IPSL-CM6A-LR | historical | r26i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
3 | CMIP | IPSL | IPSL-CM6A-LR | historical | r2i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
4 | CMIP | IPSL | IPSL-CM6A-LR | historical | r6i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor... | NaN | 20180803 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
168 | CMIP | CSIRO | ACCESS-ESM1-5 | historical | r11i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/hist... | NaN | 20200803 |
169 | CMIP | EC-Earth-Consortium | EC-Earth3-CC | historical | r1i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E... | NaN | 20210113 |
170 | ScenarioMIP | EC-Earth-Consortium | EC-Earth3-CC | ssp585 | r1i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consorti... | NaN | 20210113 |
171 | CMIP | CMCC | CMCC-ESM2 | historical | r1i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/CMIP/CMCC/CMCC-ESM2/historica... | NaN | 20210114 |
172 | ScenarioMIP | CMCC | CMCC-ESM2 | ssp585 | r1i1p1f1 | Oyr | o2 | gn | gs://cmip6/CMIP6/ScenarioMIP/CMCC/CMCC-ESM2/ss... | NaN | 20210126 |
173 rows × 11 columns
Intake knows how to automatically open the Datasets using Xarray. Furthermore, Intake-ESM contains special logic to concatenate and merge the individual results of our query into larger, more high-level aggregated Xarray Datasets.
dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})
list(dset_dict.keys())
--> The keys in the returned dictionary of datasets are constructed as follows:
'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
['CMIP.MRI.MRI-ESM2-0.historical.Oyr.gn',
'CMIP.EC-Earth-Consortium.EC-Earth3-CC.historical.Oyr.gn',
'ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp585.Oyr.gn',
'ScenarioMIP.NCAR.CESM2.ssp585.Oyr.gn',
'CMIP.CCCma.CanESM5.historical.Oyr.gn',
'CMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.historical.Oyr.gn',
'ScenarioMIP.CCCma.CanESM5-CanOE.ssp585.Oyr.gn',
'CMIP.IPSL.IPSL-CM5A2-INCA.historical.Oyr.gn',
'CMIP.NCC.NorESM2-LM.historical.Oyr.gn',
'ScenarioMIP.NCC.NorESM2-LM.ssp585.Oyr.gn',
'ScenarioMIP.NCC.NorESM2-MM.ssp585.Oyr.gn',
'CMIP.NCC.NorESM2-MM.historical.Oyr.gn',
'CMIP.MIROC.MIROC-ES2L.historical.Oyr.gn',
'ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585.Oyr.gn',
'CMIP.CMCC.CMCC-ESM2.historical.Oyr.gn',
'ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp585.Oyr.gn',
'ScenarioMIP.MRI.MRI-ESM2-0.ssp585.Oyr.gn',
'ScenarioMIP.CMCC.CMCC-ESM2.ssp585.Oyr.gn',
'ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Oyr.gn',
'CMIP.CSIRO.ACCESS-ESM1-5.historical.Oyr.gn',
'CMIP.MPI-M.MPI-ESM1-2-LR.historical.Oyr.gn',
'ScenarioMIP.MIROC.MIROC-ES2L.ssp585.Oyr.gn',
'ScenarioMIP.EC-Earth-Consortium.EC-Earth3-CC.ssp585.Oyr.gn',
'CMIP.CCCma.CanESM5-CanOE.historical.Oyr.gn',
'ScenarioMIP.CCCma.CanESM5.ssp585.Oyr.gn',
'CMIP.IPSL.IPSL-CM6A-LR.historical.Oyr.gn',
'CMIP.MPI-M.MPI-ESM1-2-HR.historical.Oyr.gn']
ds = dset_dict['CMIP.CCCma.CanESM5.historical.Oyr.gn']
ds
<xarray.Dataset> Size: 109GB Dimensions: (member_id: 35, dcpp_init_year: 1, time: 165, lev: 45, j: 291, i: 360, bnds: 2, vertices: 4) Coordinates: * i (i) int32 1kB 0 1 2 3 4 5 6 ... 354 355 356 357 358 359 * j (j) int32 1kB 0 1 2 3 4 5 6 ... 285 286 287 288 289 290 latitude (j, i) float64 838kB dask.array<chunksize=(291, 360), meta=np.ndarray> * lev (lev) float64 360B 3.047 9.454 ... 5.375e+03 5.625e+03 lev_bnds (lev, bnds) float64 720B dask.array<chunksize=(45, 2), meta=np.ndarray> longitude (j, i) float64 838kB dask.array<chunksize=(291, 360), meta=np.ndarray> * time (time) object 1kB 1850-07-02 12:00:00 ... 2014-07-02 ... time_bnds (time, bnds) object 3kB dask.array<chunksize=(165, 2), meta=np.ndarray> vertices_latitude (j, i, vertices) float64 3MB dask.array<chunksize=(291, 360, 4), meta=np.ndarray> vertices_longitude (j, i, vertices) float64 3MB dask.array<chunksize=(291, 360, 4), meta=np.ndarray> * member_id (member_id) object 280B 'r10i1p1f1' ... 'r9i1p2f1' * dcpp_init_year (dcpp_init_year) float64 8B nan Dimensions without coordinates: bnds, vertices Data variables: o2 (member_id, dcpp_init_year, time, lev, j, i) float32 109GB dask.array<chunksize=(1, 1, 12, 45, 291, 360), meta=np.ndarray> Attributes: (12/52) Conventions: CF-1.7 CMIP-6.2 YMDH_branch_time_in_child: 1850:01:01:00 activity_id: CMIP branch_method: Spin-up documentation branch_time_in_child: 0.0 cmor_version: 3.4.0 ... ... intake_esm_attrs:table_id: Oyr intake_esm_attrs:variable_id: o2 intake_esm_attrs:grid_label: gn intake_esm_attrs:version: 20190429 intake_esm_attrs:_data_format_: zarr intake_esm_dataset_key: CMIP.CCCma.CanESM5.historical.Oyr.gn
Summary
In this notebook, we used Intake-ESM to open an Xarray Dataset for one particular model and experiment.
What’s next?
We will see an example of downloading a dataset with fsspec
and zarr
.
Resources and references
Original notebook in the Pangeo Gallery by Henri Drake and Ryan Abernathey