Make Diagnostic Plots for NA-CORDEX Zarr Stores

The NA-CORDEX data archive contains output from regional climate models (RCMs) run over a domain covering most of North America using boundary conditions from global climate model (GCM) simulations in the CMIP5 archive. It differs from many climate datasets by combining runs from many independently derived simulations. The usage of Zarr, Xarray, and Intake-ESM help simplify the problem of examining these disparate runs.

See the README file for more pointers and information on the NA-CORDEX data archive.

Overview

This cookbook provides useful methods for summarizing data values in large climate datasets. It clearly shows where data have extreme values and where data are missing; this can be useful for validating that the data were gathered and stored correctly. While this cookbook is specifically designed to examine any portion of the NA-CORDEX dataset using an intake catalog, the code can be adapted straightforwardly to other datasets that can be loaded via xarray.

Topics covered in this notebook include:

Different Ways to Create and Connect to Dask Distributed Cluster
How to Find and Obtain Data Using an Intake-ESM Catalog
How to Create Various Statistical Summary Plots with matplotlib

Prerequisites

To more fully understand how the code in this notebook works, users are encouraged to explore the following introductory articles.

Concepts	Importance	Notes
Intro to Xarray	Helpful
Dask Arrays with Xarray	Helpful
Understanding of Zarr Metadata	Helpful	Familiarity with metadata structure
Understanding of Matplotlib	Helpful	Familiarity with plot elements in matplotlib

Time to learn: 45 minutes

Note: This notebook contains optional choices for running on NCAR supercomputers. These can be ignored for users running outside of the NCAR domain.

Imports

The main python packages used are xarray, intake-esm, dask, and matplotlib.

Because some of the CORDEX data have been interpolated from a Lambert Conformal grid onto a rectangular grid for ease of use, there can be many missing values on the edges of the interpolated grid. For this reason, we also turn off warnings complaining about missing values in the data.

import xarray as xr
import numpy as np
import intake
import ast
import matplotlib.pyplot as plt

# Ignore dask-numpy warnings about finding missing values in the data
%env PYTHONWARNINGS=ignore

env: PYTHONWARNINGS=ignore

Select User Options

Users may choose between:

Cloud storage provider (Amazon AWS or NCAR),
Whether to truncate the data if running on a resource-limited computer,
Whether to obtain a Dask cluster from a PBS Scheduler or via the Dask Gateway package.

Note: Using the NCAR cloud storage system requires a login account on an NCAR HPC computer.

Choose Cloud Storage (Amazon AWS or NCAR Cloud)

# If True,  use NCAR Cloud Storage.   Requires an NCAR user account.
# If False, use AWS  Cloud Storage.

USE_NCAR_CLOUD = False

Choose whether to truncate data for resource-limited computers

When running this notebook where compute resources are limited, or data transfer rates are slow, set TRUNCATE_DATA to True. This will limit time ranges to 3 years from the start of a chosen NA-CORDEX scenario.

TRUNCATE_DATA = True

Choose whether to use a PBS Scheduler

If running on a HPC computer with a PBS Scheduler, set to True. Otherwise, set to False.

USE_PBS_SCHEDULER = False

Choose whether to use Dask Gateway

If running on Jupyter server with Dask Gateway configured, set to True. Otherwise, set to False.

USE_DASK_GATEWAY = False

Create and Connect to Dask Distributed Cluster

def get_gateway_cluster():
    """ Create cluster through dask_gateway
    """
    from dask_gateway import Gateway

    gateway = Gateway()
    cluster = gateway.new_cluster()
    cluster.adapt(minimum=2, maximum=20)
    return cluster

def get_pbs_cluster():
    """ Create cluster through dask_jobqueue
    """
    from dask_jobqueue import PBSCluster
    
    if TRUNCATE_DATA:
        num_jobs = 4 
        walltime = '0:10:00' 
        memory = '2GB' 
    else:
        num_jobs = 20
        walltime = '0:20:00'
        memory = '10GB' 

    cluster = PBSCluster(cores=1, processes=1, walltime=walltime, memory=memory, queue='casper', 
                         resource_spec=f"select=1:ncpus=1:mem={memory}",)
    cluster.scale(jobs=num_jobs)
    return cluster

def get_local_cluster():
    """ Create cluster using the Jupyter server's resources
    """
    from distributed import LocalCluster
    cluster = LocalCluster()    

    if TRUNCATE_DATA:
        cluster.scale(4)
    else:
        cluster.scale(20)
    return cluster

# Obtain dask cluster in one of three ways

if USE_PBS_SCHEDULER:
    cluster = get_pbs_cluster()
elif USE_DASK_GATEWAY:
    cluster = get_gateway_cluster()
else:
    cluster = get_local_cluster()

# Connect to cluster
from distributed import Client
client = Client(cluster)

# Display cluster dashboard URL
cluster

LocalCluster

560c6164

Dashboard: http://127.0.0.1:8787/status	Workers: 4
Total threads: 4	Total memory: 15.61 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-e3001af3-1cc9-4b48-9f73-e16a9ca94285

Comm: tcp://127.0.0.1:46687	Workers: 4
Dashboard: http://127.0.0.1:8787/status	Total threads: 4
Started: Just now	Total memory: 15.61 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:35711	Total threads: 1
Dashboard: http://127.0.0.1:34819/status	Memory: 3.90 GiB
Nanny: tcp://127.0.0.1:37091
Local directory: /tmp/dask-scratch-space/worker-7s4ajiub

Worker: 1

Comm: tcp://127.0.0.1:43231	Total threads: 1
Dashboard: http://127.0.0.1:32865/status	Memory: 3.90 GiB
Nanny: tcp://127.0.0.1:39083
Local directory: /tmp/dask-scratch-space/worker-nf_bp9le

Worker: 2

Comm: tcp://127.0.0.1:44937	Total threads: 1
Dashboard: http://127.0.0.1:45143/status	Memory: 3.90 GiB
Nanny: tcp://127.0.0.1:43097
Local directory: /tmp/dask-scratch-space/worker-p2ts5m3w

Worker: 3

Comm: tcp://127.0.0.1:40771	Total threads: 1
Dashboard: http://127.0.0.1:46703/status	Memory: 3.90 GiB
Nanny: tcp://127.0.0.1:33551
Local directory: /tmp/dask-scratch-space/worker-bhq66xsi

☝️ Link to Dask dashboard will appear above.

Find and Obtain Data Using an Intake Catalog

Intake-esm Documentation

Define the Intake Catalog URL and Storage Access Options

if USE_NCAR_CLOUD:
    catalog_url = "https://stratus.ucar.edu/ncar-na-cordex/catalogs/aws-na-cordex.json"
    storage_options={"anon": True, 'client_kwargs':{"endpoint_url":"https://stratus.ucar.edu/"}}
                     
else:
    catalog_url = "https://ncar-na-cordex.s3-us-west-2.amazonaws.com/catalogs/aws-na-cordex.json"
    storage_options={"anon": True}

Open catalog and produce a content summary

# Have the catalog interpret the "na-cordex-models" column as a list of values, as opposed to single values.
col = intake.open_esm_datastore(catalog_url, read_csv_kwargs={"converters": {"na-cordex-models": ast.literal_eval}},)
col

aws-na-cordex catalog with 330 dataset(s) from 330 asset(s):

	unique
variable	15
standard_name	10
long_name	18
units	10
spatial_domain	1
grid	2
spatial_resolution	2
scenario	6
start_time	3
end_time	4
frequency	1
vertical_levels	1
bias_correction	3
na-cordex-models	26
path	330
derived_variable	0

# Show the first few lines of the catalog
col.df.head(10)

	variable	standard_name	long_name	units	spatial_domain	grid	spatial_resolution	scenario	start_time	end_time	frequency	vertical_levels	bias_correction	na-cordex-models	path
0	hurs	relative_humidity	Near-Surface Relative Humidity	%	north_america	NAM-22i	0.25 deg	eval	1979-01-01T12:00:00	2014-12-31T12:00:00	day	1	raw	[ERA-Int.CRCM5-UQAM, ERA-Int.CRCM5-OUR, ERA-In...	s3://ncar-na-cordex/day/hurs.eval.day.NAM-22i....
1	hurs	relative_humidity	Near-Surface Relative Humidity	%	north_america	NAM-44i	0.50 deg	eval	1979-01-01T12:00:00	2015-12-31T12:00:00	day	1	raw	[ERA-Int.CRCM5-UQAM, ERA-Int.RegCM4, ERA-Int.H...	s3://ncar-na-cordex/day/hurs.eval.day.NAM-44i....
2	hurs	relative_humidity	Near-Surface Relative Humidity	%	north_america	NAM-22i	0.25 deg	hist-rcp45	1949-01-01T12:00:00	2100-12-31T12:00:00	day	1	mbcn-Daymet	[CanESM2.CanRCM4]	s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
3	hurs	relative_humidity	Near-Surface Relative Humidity	%	north_america	NAM-22i	0.25 deg	hist-rcp45	1949-01-01T12:00:00	2100-12-31T12:00:00	day	1	mbcn-gridMET	[CanESM2.CanRCM4]	s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
4	hurs	relative_humidity	Near-Surface Relative Humidity	%	north_america	NAM-22i	0.25 deg	hist-rcp45	1949-01-01T12:00:00	2100-12-31T12:00:00	day	1	raw	[GFDL-ESM2M.CRCM5-OUR, CanESM2.CRCM5-OUR, CanE...	s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
5	hurs	relative_humidity	Near-Surface Relative Humidity	%	north_america	NAM-44i	0.50 deg	hist-rcp45	1949-01-01T12:00:00	2100-12-31T12:00:00	day	1	mbcn-Daymet	[MPI-ESM-LR.CRCM5-UQAM, CanESM2.CRCM5-UQAM, EC...	s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
6	hurs	relative_humidity	Near-Surface Relative Humidity	%	north_america	NAM-44i	0.50 deg	hist-rcp45	1949-01-01T12:00:00	2100-12-31T12:00:00	day	1	mbcn-gridMET	[MPI-ESM-LR.CRCM5-UQAM, CanESM2.CRCM5-UQAM, EC...	s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
7	hurs	relative_humidity	Near-Surface Relative Humidity	%	north_america	NAM-44i	0.50 deg	hist-rcp45	1949-01-01T12:00:00	2100-12-31T12:00:00	day	1	raw	[MPI-ESM-LR.CRCM5-UQAM, CanESM2.CRCM5-UQAM, EC...	s3://ncar-na-cordex/day/hurs.hist-rcp45.day.NA...
8	hurs	relative_humidity	Near-Surface Relative Humidity	%	north_america	NAM-22i	0.25 deg	hist-rcp85	1949-01-01T12:00:00	2100-12-31T12:00:00	day	1	mbcn-Daymet	[MPI-ESM-MR.CRCM5-UQAM, GEMatm-Can.CRCM5-UQAM,...	s3://ncar-na-cordex/day/hurs.hist-rcp85.day.NA...
9	hurs	relative_humidity	Near-Surface Relative Humidity	%	north_america	NAM-22i	0.25 deg	hist-rcp85	1949-01-01T12:00:00	2100-12-31T12:00:00	day	1	mbcn-gridMET	[MPI-ESM-MR.CRCM5-UQAM, GEMatm-Can.CRCM5-UQAM,...	s3://ncar-na-cordex/day/hurs.hist-rcp85.day.NA...

# Produce a catalog content summary.

uniques = col.unique()

columns = ["variable", "scenario", "grid", "na-cordex-models", "bias_correction"]
for column in columns:
    print(f'{column}: {uniques[column]}\n')

variable: ['hurs', 'huss', 'pr', 'prec', 'ps', 'rsds', 'sfcWind', 'tas', 'tasmax', 'tasmin', 'temp', 'tmax', 'tmin', 'uas', 'vas']

scenario: ['eval', 'hist-rcp45', 'hist-rcp85', 'hist', 'rcp45', 'rcp85']

grid: ['NAM-22i', 'NAM-44i']

na-cordex-models: ['ERA-Int.CRCM5-UQAM', 'ERA-Int.CRCM5-OUR', 'ERA-Int.RegCM4', 'ERA-Int.CanRCM4', 'ERA-Int.WRF', 'ERA-Int.HIRHAM5', 'ERA-Int.RCA4', 'CanESM2.CanRCM4', 'GFDL-ESM2M.CRCM5-OUR', 'CanESM2.CRCM5-OUR', 'MPI-ESM-LR.CRCM5-UQAM', 'CanESM2.CRCM5-UQAM', 'EC-EARTH.HIRHAM5', 'EC-EARTH.RCA4', 'CanESM2.RCA4', 'MPI-ESM-MR.CRCM5-UQAM', 'GEMatm-Can.CRCM5-UQAM', 'GEMatm-MPI.CRCM5-UQAM', 'HadGEM2-ES.RegCM4', 'GFDL-ESM2M.RegCM4', 'MPI-ESM-LR.RegCM4', 'HadGEM2-ES.WRF', 'GFDL-ESM2M.WRF', 'MPI-ESM-LR.WRF', 'CNRM-CM5.CRCM5-OUR', 'MPI-ESM-LR.CRCM5-OUR']

bias_correction: ['raw', 'mbcn-Daymet', 'mbcn-gridMET']

Load data into xarray using the catalog

Choose any combination of variable, grid, scenario, and bias correction listed in the catalog.

Below we choose the variable ‘tmax’ (Maximum Daily Surface Temperature) as a default for its ease of interpretation. For this example, we also choose a lower-resolution grid and the scenario with the shortest time range (15 years) to reduce the default data processing requirements.

data_var = 'tmax'

col_subset = col.search(
    variable=data_var,
    grid="NAM-44i",
    scenario="eval",
    bias_correction="raw",
)

col_subset

aws-na-cordex catalog with 1 dataset(s) from 1 asset(s):

	unique
variable	1
standard_name	1
long_name	1
units	1
spatial_domain	1
grid	1
spatial_resolution	1
scenario	1
start_time	1
end_time	1
frequency	1
vertical_levels	1
bias_correction	1
na-cordex-models	6
path	1
derived_variable	0

# Verify our query matches at least one entry in the catalog.

col_subset.df

	variable	standard_name	long_name	units	spatial_domain	grid	spatial_resolution	scenario	start_time	end_time	frequency	vertical_levels	bias_correction	na-cordex-models	path
0	tmax	air_temperature	Daily Maximum Near-Surface Air Temperature	degC	north_america	NAM-44i	0.50 deg	eval	1979-01-01T12:00:00	2015-12-31T12:00:00	day	1	raw	[ERA-Int.CRCM5-UQAM, ERA-Int.RegCM4, ERA-Int.H...	s3://ncar-na-cordex/day/tmax.eval.day.NAM-44i....

# Load catalog entries for subset into a dictionary of xarray datasets, and open the first one.

dsets = col_subset.to_dataset_dict(
    xarray_open_kwargs={"consolidated": True}, storage_options=storage_options
)
print(f"\nDataset dictionary keys:\n {dsets.keys()}")

# Load the first dataset and display a summary.
dataset_key = list(dsets.keys())[0]
store_name = dataset_key + ".zarr"

ds = dsets[dataset_key]
ds

# Note that the xarray dataset summary includes a 'member_id' coordinate, which is a renaming of the 
# 'na-cordex-models' column in the catalog.

--> The keys in the returned dictionary of datasets are constructed as follows:
	'variable.frequency.scenario.grid.bias_correction'

100.00% [1/1 00:01<00:00]

Dataset dictionary keys:
 dict_keys(['tmax.day.eval.NAM-44i.raw'])

<xarray.Dataset> Size: 13GB
Dimensions:    (lat: 129, lon: 300, member_id: 6, time: 13514, bnds: 2)
Coordinates:
  * lat        (lat) float64 1kB 12.25 12.75 13.25 13.75 ... 75.25 75.75 76.25
  * lon        (lon) float64 2kB -171.8 -171.2 -170.8 ... -23.25 -22.75 -22.25
  * member_id  (member_id) <U18 432B 'ERA-Int.CRCM5-UQAM' ... 'ERA-Int.WRF'
  * time       (time) datetime64[ns] 108kB 1979-01-01T12:00:00 ... 2015-12-31...
    time_bnds  (time, bnds) datetime64[ns] 216kB dask.array<chunksize=(13514, 2), meta=np.ndarray>
Dimensions without coordinates: bnds
Data variables:
    tmax       (member_id, time, lat, lon) float32 13GB dask.array<chunksize=(4, 1000, 65, 150), meta=np.ndarray>
Attributes: (12/42)
    CORDEX_domain:                        NAM-44
    contact:                              {"ERA-Int.CRCM5-UQAM": "Winger.Katj...
    contact_note:                         {"ERA-Int.RegCM4": "Simulations by ...
    creation_date:                        {"ERA-Int.CRCM5-UQAM": "2015-06-18T...
    driving_experiment:                   {"ERA-Int.CRCM5-UQAM": "ECMWF-ERAIN...
    driving_experiment_name:              evaluation
    ...                                   ...
    intake_esm_attrs:vertical_levels:     1
    intake_esm_attrs:bias_correction:     raw
    intake_esm_attrs:na-cordex-models:    ERA-Int.CRCM5-UQAM,ERA-Int.RegCM4,E...
    intake_esm_attrs:path:                s3://ncar-na-cordex/day/tmax.eval.d...
    intake_esm_attrs:_data_format_:       zarr
    intake_esm_dataset_key:               tmax.day.eval.NAM-44i.raw

Functions for Subsetting and Plotting

Helper Function to Create a Single Map Plot

def plotMap(ax, map_slice, date_object=None, member_id=None):
    """Create a map plot on the given axes, with min/max as text"""

    ax.imshow(map_slice, origin='lower')

    minval = map_slice.min(dim = ['lat', 'lon'])
    maxval = map_slice.max(dim = ['lat', 'lon'])

    # Format values to have at least 4 digits of precision.
    ax.text(0.01, 0.03, "Min: %3g" % minval, transform=ax.transAxes, fontsize=12)
    ax.text(0.99, 0.03, "Max: %3g" % maxval, transform=ax.transAxes, fontsize=12, horizontalalignment='right')
    ax.set_xticks([])
    ax.set_yticks([])
    
    if date_object:
        ax.set_title(date_object.values.astype(str)[:10], fontsize=12)
        
    if member_id:
        ax.set_ylabel(member_id, fontsize=12)
        
    return ax

Helper Function for Finding Dates with Available Data

def getValidDateIndexes(member_slice):
    """Search for the first and last dates with finite values."""
    min_values = member_slice.min(dim = ['lat', 'lon'])
    is_finite = np.isfinite(min_values)
    finite_indexes = np.where(is_finite)

    start_index = finite_indexes[0][0]
    end_index = finite_indexes[0][-1]
    return start_index, end_index

Helper Function for Truncating Data Slices

def truncateData(data_slice, num_years):
    """Remove all but the first num_years of valid data from the data slice."""
    start_index, _ = getValidDateIndexes(data_slice)
    start_date = data_slice.time[start_index]
    start_year = start_date.dt.year.values
    year_range = start_year + np.arange(num_years)
    
    data_slice = data_slice.isel(time=ds.time.dt.year.isin(year_range))
    return data_slice

Function Producing Maps of First, Middle, and Final Timesteps

def plot_first_mid_last(ds, data_var, store_name):
    """Plot the first, middle, and final time steps for several climate runs."""
    num_members_to_plot = 4
    member_names = ds.coords['member_id'].values[0:num_members_to_plot]
    
    figWidth = 18 
    figHeight = 12 
    numPlotColumns = 3
    fig, axs = plt.subplots(num_members_to_plot, numPlotColumns, figsize=(figWidth, figHeight), constrained_layout=True)

    for index in np.arange(num_members_to_plot):
        mem_id = member_names[index]
        data_slice = ds[data_var].sel(member_id=mem_id)
           
        start_index, end_index = getValidDateIndexes(data_slice)
        midDateIndex = np.floor(len(ds.time) / 2).astype(int)

        startDate = ds.time[start_index]
        first_step = data_slice.sel(time=startDate) 
        ax = axs[index, 0]
        plotMap(ax, first_step, startDate, mem_id)

        midDate = ds.time[midDateIndex]
        mid_step = data_slice.sel(time=midDate)   
        ax = axs[index, 1]
        plotMap(ax, mid_step, midDate)

        endDate = ds.time[end_index]
        last_step = data_slice.sel(time=endDate)            
        ax = axs[index, 2]
        plotMap(ax, last_step, endDate)
        
        plt.suptitle(f'First, Middle, and Last Timesteps for Selected Runs in "{store_name}"', fontsize=20)

    return fig

Function Producing Statistical Map Plots

def plot_stat_maps(ds, data_var, store_name, truncate_data):
    """Plot the mean, min, max, and standard deviation values for several climate runs, aggregated over time."""
    
    num_members_to_plot = 4
    member_names = ds.coords['member_id'].values[0:num_members_to_plot]

    figWidth = 25 
    figHeight = 12 
    numPlotColumns = 4
    
    fig, axs = plt.subplots(num_members_to_plot, numPlotColumns, figsize=(figWidth, figHeight), constrained_layout=True)

    for index in np.arange(num_members_to_plot):
        mem_id = member_names[index]
        data_slice = ds[data_var].sel(member_id=mem_id)
        
        if truncate_data:
            # Limit time range to one year.
            data_slice = truncateData(data_slice, 1)

        # Save slice in memory to prevent repeated disk loads
        data_slice = data_slice.persist()

        data_agg = data_slice.min(dim='time')
        plotMap(axs[index, 0], data_agg, member_id=mem_id)

        data_agg = data_slice.max(dim='time')
        plotMap(axs[index, 1], data_agg)

        data_agg = data_slice.mean(dim='time')
        plotMap(axs[index, 2], data_agg)

        data_agg = data_slice.std(dim='time')
        plotMap(axs[index, 3], data_agg)

    axs[0, 0].set_title(f'min({data_var})', fontsize=15)
    axs[0, 1].set_title(f'max({data_var})', fontsize=15)
    axs[0, 2].set_title(f'mean({data_var})', fontsize=15)
    axs[0, 3].set_title(f'std({data_var})', fontsize=15)

    plt.suptitle(f'Spatial Statistics for Selected Runs in "{store_name}"', fontsize=20)

    return fig

Function Producing Time Series Plots

Also show which dates have no available data values, as a rug plot.

def plot_timeseries(ds, data_var, store_name, truncate_data):
    """Plot the mean, min, max, and standard deviation values for several climate runs, 
       aggregated over lat/lon dimensions."""

    num_members_to_plot = 4
    member_names = ds.coords['member_id'].values[0:num_members_to_plot]

    figWidth = 25 
    figHeight = 20
    linewidth = 0.5

    numPlotColumns = 1
    fig, axs = plt.subplots(num_members_to_plot, numPlotColumns, figsize=(figWidth, figHeight))
        
    for index in np.arange(num_members_to_plot):
        mem_id = member_names[index]
        data_slice = ds[data_var].sel(member_id=mem_id)
        
        if truncate_data:
            # Limit time range to one year.
            data_slice = truncateData(data_slice, 1)

        # Save slice in memory to prevent repeated disk loads
        data_slice = data_slice.persist()

        unit_string = ds[data_var].attrs['units']
        
        min_vals = data_slice.min(dim = ['lat', 'lon'])
        max_vals = data_slice.max(dim = ['lat', 'lon'])
        mean_vals = data_slice.mean(dim = ['lat', 'lon'])
        std_vals = data_slice.std(dim = ['lat', 'lon'])

        missing_indexes = np.isnan(min_vals).compute()
        missing_times = data_slice.time[missing_indexes]

            
        axs[index].plot(data_slice.time, max_vals, linewidth=linewidth, label='max', color='red')
        axs[index].plot(data_slice.time, mean_vals, linewidth=linewidth, label='mean', color='black')
        axs[index].fill_between(data_slice.time, (mean_vals - std_vals), (mean_vals + std_vals), 
                                         color='grey', linewidth=0, label='std', alpha=0.5)
        axs[index].plot(data_slice.time, min_vals, linewidth=linewidth, label='min', color='blue')
            
        # Produce a rug plot along the bottom of the figure for missing data.
        ymin, ymax = axs[index].get_ylim()
        rug_y = ymin + 0.01*(ymax-ymin)
        axs[index].plot(missing_times, [rug_y]*len(missing_times), '|', color='m', label='missing')
        axs[index].set_title(mem_id, fontsize=20)
        axs[index].legend(loc='upper right')
        axs[index].set_ylabel(unit_string)

    plt.tight_layout(pad=10.2, w_pad=3.5, h_pad=3.5)
    plt.suptitle(f'Temporal Statistics for Selected Runs in "{store_name}"', fontsize=20)

    return fig

Produce Diagnostic Plots

Plot First, Middle, and Final Timesteps for Several Output Runs (less compute intensive)

%%time

# Plot using the Zarr Store obtained from an earlier step in the notebook.
figure = plot_first_mid_last(ds, data_var, store_name)

plt.show()

../_images/cb4d63a661d869bf07186b4d93f3e78be3885ce796ea46fcfde08dee42f35344.png

CPU times: user 6.17 s, sys: 1.39 s, total: 7.55 s
Wall time: 1min 37s

Optional: Save figure to a PNG file

Change the value of SAVE_PLOT to True to produce a PNG file of the plot. The file will be saved in the same folder as this notebook.

Then use Jupyter’s file browser to locate the file and right-click the file to download it.

SAVE_PLOT = False
if SAVE_PLOT:
    plotfile = f'./{dataset_key}_FML.png'
    figure.savefig(plotfile, dpi=100)

Create Statistical Map Plots for Several Output Runs (more compute intensive)

%%time

# Plot using the Zarr Store obtained from an earlier step in the notebook.
figure = plot_stat_maps(ds, data_var, store_name, TRUNCATE_DATA)

plt.show()

../_images/baf3fca27dec72e089f91bd819586c771caa2fdf85fa821fd72965f1681ab783.png

CPU times: user 5.24 s, sys: 1.25 s, total: 6.49 s
Wall time: 1min

Optional: Save figure to a PNG file

Change the value of SAVE_PLOT to True to produce a PNG file of the plot. The file will be saved in the same folder as this notebook.

Then use Jupyter’s file browser to locate the file and right-click the file to download it.

SAVE_PLOT = False
if SAVE_PLOT:
    plotfile = f'./{dataset_key}_MAPS.png'
    figure.savefig(plotfile, dpi=100)

Plot Time Series for Several Output Runs (more compute intensive)

%%time

# Plot using the Zarr Store obtained from an earlier step in the notebook.
figure = plot_timeseries(ds, data_var, store_name, TRUNCATE_DATA)

plt.show()

../_images/d688c02a7a0060275b30a1230afeb41b6295108017ee9918bef1b0812984a111.png

CPU times: user 4.62 s, sys: 1.31 s, total: 5.92 s
Wall time: 1min 5s

Optional: Save figure to a PNG file

Change the value of SAVE_PLOT to True to produce a PNG file of the plot. The file will be saved in the same folder as this notebook.

Then use Jupyter’s file browser to locate the file and right-click the file to download it.

SAVE_PLOT = False
if SAVE_PLOT:
    plotfile = f'./{dataset_key}_TS.png'
    figure.savefig(plotfile, dpi=100)

Release the workers.

!date

Wed Jun  5 19:32:47 UTC 2024

cluster.close()

Show which python package versions were used

%load_ext watermark
%watermark -iv

numpy     : 1.26.4
intake    : 0.7.0
matplotlib: 3.8.4
xarray    : 2024.5.0