Compare Data from ESGF and ARM
Overview
This notebook details how to compare CMIP6 data hosted through the Earth System Grid Federation (ESGF) to observations collected and hosted through the Department of Energy’s Atmospheric Radiation Measurement (ARM) user facility.
The measurement of focus is 2 meter air temperature, collected at the Southern Great Plains (SGP) site in Northern Oklahoma. This climate observatory has collected state-of-the-art observations since 1993.
Prerequisites
Concepts |
Importance |
Notes |
---|---|---|
Necessary |
||
Necessary |
Familiarity with data access patterns |
|
Helpful |
Familiarity with metadata structure |
|
Helpful |
Familiarity with lazy-loading |
Time to learn: 25 minutes
Imports
import os
import warnings
import act
from distributed import Client
import holoviews as hv
import hvplot.xarray
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import cf_xarray
import metpy
from pyesgf.search import SearchConnection
import xarray as xr
xr.set_options(display_style='html')
warnings.filterwarnings("ignore")
hv.extension('bokeh')
Spin up a Dask Cluster
We will use a Dask Local Cluster to compute in parellel and distribute our data, enabling us to work with these large datasets.
client = Client()
client
Client
Client-4b621a6d-87ed-11ee-8506-4eeb28ce1cac
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
eb993f71
Dashboard: http://127.0.0.1:8787/status | Workers: 8 |
Total threads: 32 | Total memory: 122.83 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-21cebf0d-97f8-4e26-a219-fa2ac4df23f4
Comm: tcp://127.0.0.1:38437 | Workers: 8 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 32 |
Started: Just now | Total memory: 122.83 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:35235 | Total threads: 4 |
Dashboard: http://127.0.0.1:34503/status | Memory: 15.35 GiB |
Nanny: tcp://127.0.0.1:35145 | |
Local directory: /tmp/dask-scratch-space/worker-payxyzi0 |
Worker: 1
Comm: tcp://127.0.0.1:38603 | Total threads: 4 |
Dashboard: http://127.0.0.1:36903/status | Memory: 15.35 GiB |
Nanny: tcp://127.0.0.1:41195 | |
Local directory: /tmp/dask-scratch-space/worker-67obd1st |
Worker: 2
Comm: tcp://127.0.0.1:33325 | Total threads: 4 |
Dashboard: http://127.0.0.1:33625/status | Memory: 15.35 GiB |
Nanny: tcp://127.0.0.1:40799 | |
Local directory: /tmp/dask-scratch-space/worker-ln6zrmrv |
Worker: 3
Comm: tcp://127.0.0.1:40361 | Total threads: 4 |
Dashboard: http://127.0.0.1:38293/status | Memory: 15.35 GiB |
Nanny: tcp://127.0.0.1:39359 | |
Local directory: /tmp/dask-scratch-space/worker-l3c_kmm0 |
Worker: 4
Comm: tcp://127.0.0.1:34233 | Total threads: 4 |
Dashboard: http://127.0.0.1:43769/status | Memory: 15.35 GiB |
Nanny: tcp://127.0.0.1:43539 | |
Local directory: /tmp/dask-scratch-space/worker-yli0x_h7 |
Worker: 5
Comm: tcp://127.0.0.1:46675 | Total threads: 4 |
Dashboard: http://127.0.0.1:33783/status | Memory: 15.35 GiB |
Nanny: tcp://127.0.0.1:37899 | |
Local directory: /tmp/dask-scratch-space/worker-p4rpvzzr |
Worker: 6
Comm: tcp://127.0.0.1:42801 | Total threads: 4 |
Dashboard: http://127.0.0.1:45579/status | Memory: 15.35 GiB |
Nanny: tcp://127.0.0.1:40771 | |
Local directory: /tmp/dask-scratch-space/worker-nnxl63mu |
Worker: 7
Comm: tcp://127.0.0.1:43771 | Total threads: 4 |
Dashboard: http://127.0.0.1:32879/status | Memory: 15.35 GiB |
Nanny: tcp://127.0.0.1:36761 | |
Local directory: /tmp/dask-scratch-space/worker-4ew2q96_ |
Access Data
Our first step is to access data from the ESGF data servers, and the Atmospheric Radiation Measurement (ARM) user facility, which has a long term site in Northern Oklahoma.
Access ESGF Data
A tutorial on how to access ESGF-hosted CMIP6 data is included in the Foundations section of this cookbook:
We use the following block of code to search for a single earth system model simulation, the Energe Exascale Earth System Model (E3SM), which is the Department of Energy’s flagship coupled Earth System Model.
conn = SearchConnection('https://esgf-node.llnl.gov/esg-search',
distrib=False)
ctx = conn.new_context(
facets='project,experiment_id',
project='CMIP6',
table_id='Amon',
institution_id = 'E3SM-Project',
experiment_id='historical',
source_id='E3SM-1-0',
variable='tas',
variant_label='r1i1p1f1',
)
result = ctx.search()[1]
files = result.file_context().search()
opendap_urls = [file.opendap_url for file in files]
esgf_ds = xr.open_mfdataset(opendap_urls,
combine='by_coords',
chunks={'time':480})
esgf_ds
<xarray.Dataset> Dimensions: (time: 1980, bnds: 2, lat: 180, lon: 360) Coordinates: * time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00 * lat (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5 * lon (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5 height float64 2.0 Dimensions without coordinates: bnds Data variables: time_bnds (time, bnds) object dask.array<chunksize=(300, 2), meta=np.ndarray> lat_bnds (time, lat, bnds) float64 dask.array<chunksize=(300, 180, 2), meta=np.ndarray> lon_bnds (time, lon, bnds) float64 dask.array<chunksize=(300, 360, 2), meta=np.ndarray> tas (time, lat, lon) float32 dask.array<chunksize=(300, 180, 360), meta=np.ndarray> Attributes: (12/54) Conventions: CF-1.7 CMIP-6.2 activity_id: CMIP branch_method: standard branch_time_in_child: 0.0 branch_time_in_parent: 36500.0 contact: Dave Bader (bader2@llnl.gov) ... ... e3sm_source_code_reference: https://github.com/E3SM-Project/E3SM/rel... doe_acknowledgement: This research was supported as part of t... computational_acknowledgement: The data were produced using resources o... ncclimo_generation_command: ncclimo --var=${var} -7 --dfl_lvl=1 --no... ncclimo_version: 4.8.1-alpha04 DODS_EXTRA.Unlimited_Dimension: time
Access ARM Data
We use the ARM data API, which is included in the Atmospheric Data Community Toolkit (ACT) to access the data.
Setup the Search
Before downloading our data, we need to make sure we have an ARM Data Account, and ARM Live token. Both of these can be found using this link:
Once you sign up, you will see your token. Copy and replace that where we have arm_username
and arm_password
below.
arm_username = os.getenv("ARM_USERNAME")
arm_password = os.getenv("ARM_PASSWORD")
# Meteorological observations at the Southern Great Plains site
datastream = "sgpmetE13.b1"
start_date = "2013-01-01"
end_date = "2013-02-28"
files = act.discovery.download_data(arm_username,
arm_password,
datastream,
start_date,
end_date
)
[DOWNLOADING] sgpmetE13.b1.20130101.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130102.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130103.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130104.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130105.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130106.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130107.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130108.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130109.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130110.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130111.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130112.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130113.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130114.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130115.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130116.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130117.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130118.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130119.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130120.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130121.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130122.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130123.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130124.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130125.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130126.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130127.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130128.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130129.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130130.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130131.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130201.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130202.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130203.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130204.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130205.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130206.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130207.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130208.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130209.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130210.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130211.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130212.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130213.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130214.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130215.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130216.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130217.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130218.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130218.170700.cdf
[DOWNLOADING] sgpmetE13.b1.20130219.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130220.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130221.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130222.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130223.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130224.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130225.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130226.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130227.000000.cdf
[DOWNLOADING] sgpmetE13.b1.20130228.000000.cdf
If you use these data to prepare a publication, please cite:
Kyrouac, J., Shi, Y., & Tuftedal, M. Surface Meteorological Instrumentation
(MET). Atmospheric Radiation Measurement (ARM) User Facility.
https://doi.org/10.5439/1786358
Subset and Prepare Data to be Compared
We need to subset the climate model output for the nearest grid point, over the SGP site.
lat = arm_ds.lat.values[0]
lon = arm_ds.lon.values[0]
lat, lon
(36.605, -97.485)
Xarray offers this subsetting functionality, and we specify we want the nearest gird point to the site.
cmip6_nearest = esgf_ds.cf.sel(lat=lat,
lon=lon,
method='nearest')
cmip6_nearest
<xarray.Dataset> Dimensions: (time: 1980, bnds: 2) Coordinates: * time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00 lat float64 36.5 lon float64 -97.5 height float64 2.0 Dimensions without coordinates: bnds Data variables: time_bnds (time, bnds) object dask.array<chunksize=(300, 2), meta=np.ndarray> lat_bnds (time, bnds) float64 dask.array<chunksize=(300, 2), meta=np.ndarray> lon_bnds (time, bnds) float64 dask.array<chunksize=(300, 2), meta=np.ndarray> tas (time) float32 dask.array<chunksize=(300,), meta=np.ndarray> Attributes: (12/54) Conventions: CF-1.7 CMIP-6.2 activity_id: CMIP branch_method: standard branch_time_in_child: 0.0 branch_time_in_parent: 36500.0 contact: Dave Bader (bader2@llnl.gov) ... ... e3sm_source_code_reference: https://github.com/E3SM-Project/E3SM/rel... doe_acknowledgement: This research was supported as part of t... computational_acknowledgement: The data were produced using resources o... ncclimo_generation_command: ncclimo --var=${var} -7 --dfl_lvl=1 --no... ncclimo_version: 4.8.1-alpha04 DODS_EXTRA.Unlimited_Dimension: time
We need to convert our time to datetime to make it easier to compare.
cmip6_nearest['time'] = cmip6_nearest.indexes['time'].to_datetimeindex()
Next, we select the times we have data from the SGP site, specified earlier in the notebook.
cmip6_nearest = cmip6_nearest.sel(time=slice(start_date,
end_date)).resample(time='1M').mean()
Calculate Monthly Mean Temperature at SGP
We can calculate the monthly average temperature at the SGP site using the resample
method in Xarray
.
arm_ds = arm_ds.sortby('time')
sgp_monthly_mean_temperature = arm_ds.temp_mean.resample(time='1M').mean().compute().rename('tas (ARM)')
We need to apply some data cleaning here too - converting our units of temperature to degrees Celsius for the CMIP6 data.
cmip6_monthly_mean_temperature = cmip6_nearest.tas.compute().metpy.quantify()
cmip6_monthly_mean_temperature = cmip6_monthly_mean_temperature.metpy.convert_units('degC').rename("tas (CMIP6)")
Visaulize the Output
Once we have our comparisons ready, we can visualize using hvPlot
, which produces an interactive visualization!
esgf_plot = cmip6_monthly_mean_temperature.hvplot.bar(title='Average Surface Temperature \n near the Southern Great Plains Field Site',
xlabel='Time')
arm_plot = sgp_monthly_mean_temperature.hvplot.bar(ylabel='Average Temperature (degC)',
xlabel='Time')
esgf_plot * arm_plot
Summary
In this notebook, we searched for and opened a CMIP6 E3SM dataset using the ESGF API and OPeNDAP, and compared to an ARM dataset collected at the Southern Great Plains climate observatory.