Simulation vs Observational Data of Shallow Cumulus Clouds over the Southern Great Plains on April 4th, 2019
Import¶
# Lasso Simulation Data
# import dask
from datetime import datetime
from distributed import Client
import numpy as np
import xarray as xr
import xwrf
import s3fs
import fsspec
import xarray as xr
import glob
import matplotlib.pyplot as plt
/home/runner/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/xwrf/__init__.py:5: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
from pkg_resources import DistributionNotFound, get_distribution
Spin up a Dask Cluster¶
We will use Dask here to access the data in a parallel/distributed manner.
client = Client()
client
Access LASSO SGP Data from the NSF Jetstream Cloud¶
A subset of the LASSO Shallow Cumulus Experiment over the Southern Great Plains site has been made available on a cloud bucket, hosted through Project Pythia. These datasets were originally accessed through the LASSO bundle browser, untarred, then uploaded to the cloud bucket. We focus exclusively on the April 4, 2019 case.
Data were obtained from the Atmospheric Radiation Measurement (ARM) Program sponsored by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, Climate and Environmental Sciences Division.
Set the URL to access the data¶
The data are stored on a bucket, which is a web-accessible place where we can remotely stream the data, without downloading directly. The bucket is located on the NSF jetstream cloud, which we can see below. We then use fsspec
to easily list the directrories and load in the data.
Below we set the url, then list (glob) the directories in the bucket.
# Set the URL and path for the cloud
URL = 'https://js2.jetstream-cloud.org:8001/'
path = f'pythia/lasso-sgp'
fs = fsspec.filesystem("s3", anon=True, client_kwargs=dict(endpoint_url=URL))
fs.glob(f"{path}/*")
['pythia/lasso-sgp/high_res_obs',
'pythia/lasso-sgp/metrics',
'pythia/lasso-sgp/sim0001',
'pythia/lasso-sgp/sim0002',
'pythia/lasso-sgp/sim0003',
'pythia/lasso-sgp/sim0004',
'pythia/lasso-sgp/sim0005',
'pythia/lasso-sgp/sim0006',
'pythia/lasso-sgp/sim0007',
'pythia/lasso-sgp/sim0008']
We notice that there are 8 simulations, as well as observations in the bucket. We are going to start with the fourth simulation, setting a path to the actual output, which is under /raw_model/
case_date = datetime(2019, 4, 4)
sim_id = 4
# Read the wrfstat files
wrfstat_pattern = f's3://{path}/sim000{sim_id}/raw_model/wrfstat*'
# Read the wrfout files
wrfout_pattern = f's3://{path}/sim000{sim_id}/raw_model/wrfout*'
wrfstat_files = sorted(fs.glob(wrfstat_pattern))
wrfout_files = sorted(fs.glob(wrfout_pattern))
Now that we have lists of files, we setup a path to read into xarray since we need the bucket information as well.
wrfstat_file_list = [fs.open(file) for file in wrfstat_files]
wrfout_file_list = [fs.open(file) for file in wrfout_files]
Load Data Using Xarray and View Variables¶
We have a single WRF stat file, which we can load into xarray, then postprocess with xwrf.
ds_stat = xr.open_mfdataset(wrfstat_file_list, engine='h5netcdf').xwrf.postprocess()
ds_stat
---------------------------------------------------------------------------
ClientError Traceback (most recent call last)
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/s3fs/core.py:114, in _error_wrapper(func, args, kwargs, retries)
113 try:
--> 114 return await func(*args, **kwargs)
115 except S3_RETRYABLE_ERRORS as e:
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/aiobotocore/context.py:36, in with_current_context.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
35 await resolve_awaitable(hook())
---> 36 return await func(*args, **kwargs)
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/aiobotocore/client.py:415, in AioBaseClient._make_api_call(self, operation_name, api_params)
414 error_class = self.exceptions.from_code(error_code)
--> 415 raise error_class(parsed_response, operation_name)
416 else:
ClientError: An error occurred (PreconditionFailed) when calling the GetObject operation: None
The above exception was the direct cause of the following exception:
OSError Traceback (most recent call last)
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/s3fs/core.py:2378, in S3File._fetch_range(self, start, end)
2377 try:
-> 2378 return _fetch_range(
2379 self.fs,
2380 self.bucket,
2381 self.key,
2382 self.version_id,
2383 start,
2384 end,
2385 req_kw=self.req_kw,
2386 )
2388 except OSError as ex:
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/s3fs/core.py:2550, in _fetch_range(fs, bucket, key, version_id, start, end, req_kw)
2549 logger.debug("Fetch: %s/%s, %s-%s", bucket, key, start, end)
-> 2550 return sync(fs.loop, _inner_fetch, fs, bucket, key, version_id, start, end, req_kw)
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs)
102 elif isinstance(return_result, BaseException):
--> 103 raise return_result
104 else:
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout)
55 try:
---> 56 result[0] = await coro
57 except Exception as ex:
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/s3fs/core.py:2568, in _inner_fetch(fs, bucket, key, version_id, start, end, req_kw)
2566 resp["Body"].close()
-> 2568 return await _error_wrapper(_call_and_read, retries=fs.retries)
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/s3fs/core.py:146, in _error_wrapper(func, args, kwargs, retries)
145 err = translate_boto_error(err)
--> 146 raise err
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/s3fs/core.py:114, in _error_wrapper(func, args, kwargs, retries)
113 try:
--> 114 return await func(*args, **kwargs)
115 except S3_RETRYABLE_ERRORS as e:
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/s3fs/core.py:2555, in _inner_fetch.<locals>._call_and_read()
2554 async def _call_and_read():
-> 2555 resp = await fs._call_s3(
2556 "get_object",
2557 Bucket=bucket,
2558 Key=key,
2559 Range="bytes=%i-%i" % (start, end - 1),
2560 **version_id_kw(version_id),
2561 **req_kw,
2562 )
2563 try:
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/s3fs/core.py:371, in S3FileSystem._call_s3(self, method, *akwarglist, **kwargs)
370 additional_kwargs = self._get_s3_method_kwargs(method, *akwarglist, **kwargs)
--> 371 return await _error_wrapper(
372 method, kwargs=additional_kwargs, retries=self.retries
373 )
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/s3fs/core.py:146, in _error_wrapper(func, args, kwargs, retries)
145 err = translate_boto_error(err)
--> 146 raise err
OSError: [Errno 22] None
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 ds_stat = xr.open_mfdataset(wrfstat_file_list, engine='h5netcdf').xwrf.postprocess()
2 ds_stat
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/xarray/backends/api.py:1635, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)
1632 open_ = open_dataset
1633 getattr_ = getattr
-> 1635 datasets = [open_(p, **open_kwargs) for p in paths1d]
1636 closers = [getattr_(ds, "_close") for ds in datasets]
1637 if preprocess is not None:
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/xarray/backends/api.py:687, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
675 decoders = _resolve_decoders_kwargs(
676 decode_cf,
677 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...) 683 decode_coords=decode_coords,
684 )
686 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 687 backend_ds = backend.open_dataset(
688 filename_or_obj,
689 drop_variables=drop_variables,
690 **decoders,
691 **kwargs,
692 )
693 ds = _dataset_from_backend_dataset(
694 backend_ds,
695 filename_or_obj,
(...) 705 **kwargs,
706 )
707 return ds
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/xarray/backends/h5netcdf_.py:458, in H5netcdfBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, format, group, lock, invalid_netcdf, phony_dims, decode_vlen_strings, driver, driver_kwds, storage_options)
455 emit_phony_dims_warning, phony_dims = _check_phony_dims(phony_dims)
457 filename_or_obj = _normalize_path(filename_or_obj)
--> 458 store = H5NetCDFStore.open(
459 filename_or_obj,
460 format=format,
461 group=group,
462 lock=lock,
463 invalid_netcdf=invalid_netcdf,
464 phony_dims=phony_dims,
465 decode_vlen_strings=decode_vlen_strings,
466 driver=driver,
467 driver_kwds=driver_kwds,
468 storage_options=storage_options,
469 )
471 store_entrypoint = StoreBackendEntrypoint()
473 ds = store_entrypoint.open_dataset(
474 store,
475 mask_and_scale=mask_and_scale,
(...) 481 decode_timedelta=decode_timedelta,
482 )
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/xarray/backends/h5netcdf_.py:167, in H5NetCDFStore.open(cls, filename, mode, format, group, lock, autoclose, invalid_netcdf, phony_dims, decode_vlen_strings, driver, driver_kwds, storage_options)
162 raise ValueError(
163 "can't open netCDF4/HDF5 as bytes "
164 "try passing a path or file-like object"
165 )
166 elif isinstance(filename, io.IOBase):
--> 167 magic_number = read_magic_number_from_file(filename)
168 if not magic_number.startswith(b"\211HDF\r\n\032\n"):
169 raise ValueError(
170 f"{magic_number!r} is not the signature of a valid netCDF4 file"
171 )
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/xarray/core/utils.py:688, in read_magic_number_from_file(filename_or_obj, count)
686 if filename_or_obj.tell() != 0:
687 filename_or_obj.seek(0)
--> 688 magic_number = filename_or_obj.read(count)
689 filename_or_obj.seek(0)
690 else:
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/fsspec/spec.py:2111, in AbstractBufferedFile.read(self, length)
2108 if length == 0:
2109 # don't even bother calling fetch
2110 return b""
-> 2111 out = self.cache._fetch(self.loc, self.loc + length)
2113 logger.debug(
2114 "%s read: %i - %i %s",
2115 self,
(...) 2118 self.cache._log_stats(),
2119 )
2120 self.loc += len(out)
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/fsspec/caching.py:288, in ReadAheadCache._fetch(self, start, end)
286 end = min(self.size, end + self.blocksize)
287 self.total_requested_bytes += end - start
--> 288 self.cache = self.fetcher(start, end) # new block replaces old
289 self.start = start
290 self.end = self.start + len(self.cache)
File ~/micromamba/envs/lasso-those-clouds-cookbook-dev/lib/python3.13/site-packages/s3fs/core.py:2389, in S3File._fetch_range(self, start, end)
2378 return _fetch_range(
2379 self.fs,
2380 self.bucket,
(...) 2385 req_kw=self.req_kw,
2386 )
2388 except OSError as ex:
-> 2389 if ex.args[0] == errno.EINVAL and "pre-conditions" in ex.args[1]:
2390 raise FileExpired(
2391 filename=self.details["name"], e_tag=self.details.get("ETag")
2392 ) from ex
2393 else:
TypeError: argument of type 'NoneType' is not iterable
# Plotting wrfstat variables...
# path_shcu_root = "/gpfs/wolf2/arm/atm124/world-shared/arm-summer-school-2024/lasso_tutorial/ShCu/untar/" # on cumulus
path_shcu_root = "/data/project/ARM_Summer_School_2024_Data/lasso_tutorial/ShCu/untar" # on Jupyter
case_date = datetime(2019, 4, 4)
sim_id = 4
ds_stat = xr.open_dataset(f"{path_shcu_root}/{case_date:%Y%m%d}/sim{sim_id:04d}/raw_model/wrfstat_d01_{case_date:%Y-%m-%d_12:00:00}.nc")
ds_stat
Plot Variables and Modify as Desired¶
xwrf automatically corrected the time for us! So now we can focus on subsetting given an hour. In this case, we are interested in 1700 UTC
.
ds_stat.CSV_LWC
hour_to_plot = 17
# Time series:
ds_stat["CST_LWP"].plot()
plt.show()
# Profile at a selected time (plots sideways, though, since we are being lazy):
ds_stat["CSP_LWC"].sel(Time=f"{case_date:%Y-%m-%d} {hour_to_plot}:00").plot()
plt.show()
# X-Y slice for a selected time:
ds_stat["CSS_LWP"].sel(Time=f"{case_date:%Y-%m-%d} {hour_to_plot}:00").plot()
plt.show()
# A vertical slice from the volume at a selected time:
# We'll assign the vertical coordinate values for this one and hide the cloud-free upper atmosphere.
plot_data = ds_stat["CSV_LWC"].assign_coords(height=(ds_stat["CSP_Z"]))
plot_data.sel(Time=f"{case_date:%Y-%m-%d} {hour_to_plot}:00", y=-12450).plot(y="height", ylim=[0, 1500])
plt.show()
# Add lines and modify variables to plot desired figures... In this notebook, we plotted
Plot wind speed¶
Let’s start with the first 10 files from the WRF simulation.
# Note the extra details required by open_mfdataset to connect the files together in time.
ds_wrf = xr.open_mfdataset(wrfout_file_list[:10],
combine="nested",
concat_dim="Time")
Since we did not use xwrf, this time, let’s fix the times.
ds_wrf["Time"] = ds_wrf["XTIME"]
Visualize the Wind Vectors and Expose Destaggering¶
Next, we plot wind vectors at a selected level to demonstrate how to destagger the wind components to cell-center values with xarray. The destaggering is something that we can automatically handle with xwrf, but we explain the process below to be more transparent, especially with winds that are not in the cell-centers.
Destagger and Rename Dimensions¶
We need to:
- destagger to cell centers
- rename the staggered dimension back to the non-staggered name to avoid dimension conflicts
- (re)name the unstaggered wind for convenience
- Then, we are able to put these new DataArrays back into the ds_wrf Dataset.
# Plot wind vectors at a selected level to demonstrate how to destagger the wind components to cell-center values with xarray...
plot_level = 12 # index of level to plot
skip_xy = 10 # Sampling interval for the vector thinning
nt, nz, ny, nx = ds_wrf["T"].shape
ds_wrf["UA"] = 0.5*( ds_wrf["U"].isel(west_east_stag=slice(0, nx)) +
ds_wrf["U"].shift(west_east_stag=-1).isel(west_east_stag=slice(0, nx)) ).\
rename("UA").rename(west_east_stag="west_east")
ds_wrf["VA"] = 0.5*( ds_wrf["V"].isel(south_north_stag=slice(0, ny)) +
ds_wrf["V"].shift(south_north_stag=-1).isel(south_north_stag=slice(0, ny)) ).\
rename("VA").rename(south_north_stag="south_north")
ds_wrf["SPD"] = np.sqrt(ds_wrf["UA"]**2 + ds_wrf["VA"]**2).rename("wind speed").\
assign_attrs(units="m s-1", description="wind speed")
Visualize the Speed and Wind Vectors¶
Now, we can proceed to more plotting-specific data manipulation. We need to add spatial variables for the idealized domain (since XLAT and XLONG are constant in the file). This is needed by the xarray quiver routine.
Then, thin the grid to reduce the number of arrrows.
ds_wrf["west_east"] = xr.DataArray(data=np.arange(nx)*ds_wrf.attrs["DX"], dims="west_east", name="west_east", attrs={"units": "m"})
ds_wrf["south_north"] = xr.DataArray(data=np.arange(ny)*ds_wrf.attrs["DX"], dims="south_north", name="south_north", attrs={"units": "m"})
ds_wrf_thinned = ds_wrf.\
isel(west_east=slice(0, nx, skip_xy), south_north=slice(0, ny, skip_xy), bottom_top=plot_level).\
sel(Time=f"{case_date:%Y-%m-%d} {hour_to_plot}:00")
fig, ax = plt.subplots(ncols=1)
ds_wrf_thinned["SPD"].plot(ax=ax, x="west_east", y="south_north")
ds_wrf_thinned.plot.quiver(ax=ax, x="west_east", y="south_north", u="UA", v="VA",
scale=100)
plt.show()
Compare with Observational Data from ARM¶
Now that we have plotted the simulation data from WRF, let’s take a look at the observations.
fs.glob(f"{path}/*")
fs.glob('pythia/lasso-sgp/high_res_obs/sgp*')
# Compare with ARM Observational Data
import os
from arm_test_data import DATASETS
import matplotlib.pyplot as plt
import act
# Place your username and token here
username = os.getenv('ARM_USERNAME')
token = os.getenv('ARM_PASSWORD')
# If the username and token are not set, use the existing sample file
if username is None or token is None or len(username) == 0 or len(token) == 0:
filename_ceil = DATASETS.fetch('sgpceilC1.b1.20190101.000000.nc')
ceil_ds = act.io.arm.read_arm_netcdf(filename_ceil, engine='netcdf4')
else:
# Example to show how easy it is to download ARM data if a username/token are set
results = act.discovery.download_arm_data(
username, token, 'sgpceilC1.b1', '2022-01-14', '2022-01-19'
)
ceil_ds = act.io.arm.read_arm_netcdf(results)
# Adjust ceilometer data for plotting
ceil_ds = act.corrections.ceil.correct_ceil(ceil_ds, -9999.0)
# Plot up ceilometer backscatter using HomeyerRainbow CVD friendly colormap
# The same could be done with keyword 'cmap='HomeyerRainbow'
display = act.plotting.TimeSeriesDisplay(ceil_ds, subplot_shape=(1,), figsize=(15, 5))
display.plot('backscatter', subplot_index=(0,), cvd_friendly=True)
plt.show()
# ARM Plotting v2
import act
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
Plot Desired Variables¶
# Set your username and token here!
username = '***'
token = '***'
# Set the datastream and start/enddates
datastream = 'sgpaosmetE13.a1'
startdate = '2019-04-04'
enddate = '2019-04-05'
# Use ACT to easily download the data. Watch for the data citation! Show some support
# for ARM's instrument experts and cite their data if you use it in a publication
result = act.discovery.download_arm_data(username, token, datastream, startdate, enddate)
# Let's read in the data using ACT and check out the data
ds_mpl = act.io.read_arm_netcdf(result)
ds_mpl
Change variable (as desired), Apply QC, and Plot Again¶
# Let's take a look at the quality control information associated with a variable from the MPL
variable = 'temperature_ambient'
# First, for many of the ACT QC features, we need to get the dataset more to CF standard and that
# involves cleaning up some of the attributes and ways that ARM has historically handled QC
ds_mpl.clean.cleanup()
# Apply corrections for the ceilometer, correcting for the vertical height
#ds_mpl = act.corrections.ceil.correct_ceil(ds_mpl,-999.0)
# Next, let's take a look at visualizing the quality control information
# Create a plotting display object with 2 plots
display = act.plotting.TimeSeriesDisplay(ds_mpl, figsize=(10, 5), subplot_shape=(1,))
# # Plot up the variable in the first plot
display.plot(variable, subplot_index=(0,))
# # Plot up the QC variable in the second plot
# #display.qc_flag_block_plot(variable, subplot_index=(1,))
# plt.show()