ESGF logo ARM logo

Compare Data from ESGF and ARM

Overview

This notebook details how to compare CMIP6 data hosted through the Earth System Grid Federation (ESGF) to observations collected and hosted through the Department of Energy’s Atmospheric Radiation Measurement (ARM) user facility.

The measurement of focus is 2 meter air temperature, collected at the Southern Great Plains (SGP) site in Northern Oklahoma. This climate observatory has collected state-of-the-art observations since 1993.

Prerequisites

Concepts

Importance

Notes

Intro to Xarray

Necessary

Search and Load CMIP6 Data via ESGF/OPeNDAP

Necessary

Familiarity with data access patterns

Understanding of NetCDF

Helpful

Familiarity with metadata structure

Dask Arrays with Xarray

Helpful

Familiarity with lazy-loading

  • Time to learn: 25 minutes

Imports

import os
import warnings

import act
from distributed import Client
import holoviews as hv
import hvplot.xarray
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import cf_xarray
import metpy
from pyesgf.search import SearchConnection
import xarray as xr

xr.set_options(display_style='html')
warnings.filterwarnings("ignore")
hv.extension('bokeh')

Spin up a Dask Cluster

We will use a Dask Local Cluster to compute in parellel and distribute our data, enabling us to work with these large datasets.

client = Client()
client

Client

Client-4b621a6d-87ed-11ee-8506-4eeb28ce1cac

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

Access Data

Our first step is to access data from the ESGF data servers, and the Atmospheric Radiation Measurement (ARM) user facility, which has a long term site in Northern Oklahoma.

Access ESGF Data

A tutorial on how to access ESGF-hosted CMIP6 data is included in the Foundations section of this cookbook:

We use the following block of code to search for a single earth system model simulation, the Energe Exascale Earth System Model (E3SM), which is the Department of Energy’s flagship coupled Earth System Model.

conn = SearchConnection('https://esgf-node.llnl.gov/esg-search',
                        distrib=False)
ctx = conn.new_context(
    facets='project,experiment_id',
    project='CMIP6',
    table_id='Amon',
    institution_id = 'E3SM-Project',
    experiment_id='historical',
    source_id='E3SM-1-0',
    variable='tas',
    variant_label='r1i1p1f1',
)
result = ctx.search()[1]
files = result.file_context().search()
opendap_urls = [file.opendap_url for file in files]
esgf_ds = xr.open_mfdataset(opendap_urls,
                       combine='by_coords',
                       chunks={'time':480})
esgf_ds
<xarray.Dataset>
Dimensions:    (time: 1980, bnds: 2, lat: 180, lon: 360)
Coordinates:
  * time       (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
  * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lon        (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
    height     float64 2.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) object dask.array<chunksize=(300, 2), meta=np.ndarray>
    lat_bnds   (time, lat, bnds) float64 dask.array<chunksize=(300, 180, 2), meta=np.ndarray>
    lon_bnds   (time, lon, bnds) float64 dask.array<chunksize=(300, 360, 2), meta=np.ndarray>
    tas        (time, lat, lon) float32 dask.array<chunksize=(300, 180, 360), meta=np.ndarray>
Attributes: (12/54)
    Conventions:                     CF-1.7 CMIP-6.2
    activity_id:                     CMIP
    branch_method:                   standard
    branch_time_in_child:            0.0
    branch_time_in_parent:           36500.0
    contact:                         Dave Bader (bader2@llnl.gov)
    ...                              ...
    e3sm_source_code_reference:      https://github.com/E3SM-Project/E3SM/rel...
    doe_acknowledgement:             This research was supported as part of t...
    computational_acknowledgement:   The data were produced using resources o...
    ncclimo_generation_command:      ncclimo --var=${var} -7 --dfl_lvl=1 --no...
    ncclimo_version:                 4.8.1-alpha04
    DODS_EXTRA.Unlimited_Dimension:  time

Clean up the dataset

We need to adjust the 0 to 360 degree longitude to be -180 to 180 - we can do this generically using the climate forecast (CF) conventions.

lon_coord = esgf_ds.cf['X'].name
esgf_ds[lon_coord] = (esgf_ds[lon_coord] + 180) % 360 - 180
esgf_ds = esgf_ds.sortby(lon_coord)

Access ARM Data

We use the ARM data API, which is included in the Atmospheric Data Community Toolkit (ACT) to access the data.

Load the Data Using Xarray

arm_ds = xr.open_mfdataset(files,
                           combine='nested',
                           concat_dim='time',
                           chunks={'time':86400})

Subset and Prepare Data to be Compared

We need to subset the climate model output for the nearest grid point, over the SGP site.

lat = arm_ds.lat.values[0]
lon = arm_ds.lon.values[0]
lat, lon
(36.605, -97.485)

Xarray offers this subsetting functionality, and we specify we want the nearest gird point to the site.

cmip6_nearest = esgf_ds.cf.sel(lat=lat,
                               lon=lon,
                               method='nearest')
cmip6_nearest
<xarray.Dataset>
Dimensions:    (time: 1980, bnds: 2)
Coordinates:
  * time       (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
    lat        float64 36.5
    lon        float64 -97.5
    height     float64 2.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) object dask.array<chunksize=(300, 2), meta=np.ndarray>
    lat_bnds   (time, bnds) float64 dask.array<chunksize=(300, 2), meta=np.ndarray>
    lon_bnds   (time, bnds) float64 dask.array<chunksize=(300, 2), meta=np.ndarray>
    tas        (time) float32 dask.array<chunksize=(300,), meta=np.ndarray>
Attributes: (12/54)
    Conventions:                     CF-1.7 CMIP-6.2
    activity_id:                     CMIP
    branch_method:                   standard
    branch_time_in_child:            0.0
    branch_time_in_parent:           36500.0
    contact:                         Dave Bader (bader2@llnl.gov)
    ...                              ...
    e3sm_source_code_reference:      https://github.com/E3SM-Project/E3SM/rel...
    doe_acknowledgement:             This research was supported as part of t...
    computational_acknowledgement:   The data were produced using resources o...
    ncclimo_generation_command:      ncclimo --var=${var} -7 --dfl_lvl=1 --no...
    ncclimo_version:                 4.8.1-alpha04
    DODS_EXTRA.Unlimited_Dimension:  time

We need to convert our time to datetime to make it easier to compare.

cmip6_nearest['time'] = cmip6_nearest.indexes['time'].to_datetimeindex()

Next, we select the times we have data from the SGP site, specified earlier in the notebook.

cmip6_nearest = cmip6_nearest.sel(time=slice(start_date,
                                             end_date)).resample(time='1M').mean()

Calculate Monthly Mean Temperature at SGP

We can calculate the monthly average temperature at the SGP site using the resample method in Xarray.

arm_ds = arm_ds.sortby('time')
sgp_monthly_mean_temperature = arm_ds.temp_mean.resample(time='1M').mean().compute().rename('tas (ARM)')

We need to apply some data cleaning here too - converting our units of temperature to degrees Celsius for the CMIP6 data.

cmip6_monthly_mean_temperature = cmip6_nearest.tas.compute().metpy.quantify()
cmip6_monthly_mean_temperature = cmip6_monthly_mean_temperature.metpy.convert_units('degC').rename("tas (CMIP6)")

Visaulize the Output

Once we have our comparisons ready, we can visualize using hvPlot, which produces an interactive visualization!

esgf_plot = cmip6_monthly_mean_temperature.hvplot.bar(title='Average Surface Temperature \n near the Southern Great Plains Field Site',
                                                       xlabel='Time')
arm_plot = sgp_monthly_mean_temperature.hvplot.bar(ylabel='Average Temperature (degC)',
                                                    xlabel='Time')

esgf_plot * arm_plot

Summary

In this notebook, we searched for and opened a CMIP6 E3SM dataset using the ESGF API and OPeNDAP, and compared to an ARM dataset collected at the Southern Great Plains climate observatory.

What’s next?

We will see some more advanced examples of using the CMIP6 and obsverational data.

Resources and references