Reading in CESM output - Ocean Biogeochemistry Cookbook

Overview¶

Output from one run of CESM is the main dataset that we’ll be looking at in this cookbook. Let’s learn how to read it in. And note that this is just one way that CESM output can look. This run has been post-processed, so the data are in the form of “time-series” files, where each file stores one variable across the full timespan of the run. Before this processing, CESM actually outputs data in the form of “history” files instead, where each file contains all variables over a shorter time-slice. We won’t dive into the specifics of CESM data processing here, but this Jupyter book from the CESM tutorial has some more info!

Prerequisites¶

Concepts	Importance	Notes
Intro to Xarray	Necessary

Time to learn: 5 min

Imports¶

import xarray as xr
import glob
import s3fs
import netCDF4

Loading our data into xarray¶

Our data is stored in the cloud on Jetstream2. We load in each file path, then use xarray’s open_mfdataset() function to load all the files into an xarray Dataset, dropping a few variables whose coordinates don’t fit nicely.

jetstream_url = 'https://js2.jetstream-cloud.org:8001/'

s3 = s3fs.S3FileSystem(anon=True, client_kwargs=dict(endpoint_url=jetstream_url))

# Generate a list of all files in CESM folder
s3path = 's3://pythia/ocean-bgc/cesm/g.e22.GOMIPECOIAF_JRA-1p4-2018.TL319_g17.4p2z.002branch/ocn/proc/tseries/month_1/*'
remote_files = s3.glob(s3path)

# Open all files from folder
fileset = [s3.open(file) for file in remote_files]

# Open with xarray
ds = xr.open_mfdataset(fileset, data_vars="minimal", coords='minimal', compat="override", parallel=True,
                       drop_variables=["transport_components", "transport_regions", 'moc_components'], decode_times=True)

ds

Looks good!

Summary¶

You’ve learned how to read in CESM output, which we’ll be using for all the following notebooks in this cookbook.

Resources and references¶

Preamble

How to cite this cookbook

Intro

Making a simple map of a variable