DYAMOND LLC2160 Ocean Dataset Notebook - NSDF-OpenVISUS Cookbook

The DYnamics of the Atmospheric general circulation Modeled On Non-hydrostatic Domains (DYAMOND) data provides high resolution ocean circulation models, offering unprecedented detail. This dataset comprises a C1440 configuration of the Goddard Earth Observing System (GEOS) atmospheric model, with 7-km horizontal grid spacing and 72 vertical layers, coupled to a LLC2160 configuration of the Massachusetts Institute of Technology general circulation model (MITgcm) with 2–4-km grid spacing and 90 vertical levels. The C1440-LLC2160 simulation has been integrated for 14 months, starting from prescribed initial conditions on January 20, 2020.

Overview¶

This notebook demonstrates how to access and visualize high-resolution ocean data from the LLC2160 dataset using OpenVisus. The data is hosted in OSDF and served using Pelican Platform and OpenVisus. You’ll learn how to read metadata from the cloud, interactively select variables, and explore regional and depth-based slices of the data.

Read the metadata file from cloud
Data Subset
Visualize the data
Explore multi-resolution data for a specific region and depth

By the end of this notebook, you will understand how to:

Stream and query oceanographic data using PelicanFS
Use metadata to inform data exploration
Visualize regional and depth-specific ocean data using Panel and Bokeh

Prerequisites¶

This section was inspired by this template of the wonderful The Turing Way Jupyter Book.

Concepts	Importance	Notes
OpenVisus	Helpful	Required for multiresolution data access and streaming
Oceanographic data formats and interpretation	Helpful	Understanding of gridded climate/ocean data such as LLC2160
PelicanFS	Helpful	Used for high-performance data access from cloud storage

Time to learn: 30 minutes
System requirements:
- Python packages: panel, bokeh, xmltodict, colorcet, boto3, basemap, pelicanfs, OpenVisus, openvisuspy
- Recommended: Python ≥ 3.8, internet access for cloud-hosted data

Step 1: Importing the libraries¶

import numpy as np
import openvisuspy as ovp
import matplotlib.pyplot as plt

The section below shows different LLC2160 fields we have available in cloud. Each field is >200TB.¶

variable='salt' # options are: u,v,w,salt,theta

base_url= "pelican://osg-htc.org/nasa/nsdf/climate3/dyamond/"
if variable=="theta" or variable=="w":
    base_dir=f"mit_output/llc2160_{variable}/llc2160_{variable}.idx"
elif variable=="u":
    base_dir= "mit_output/llc2160_arco/visus.idx"
else:
    base_dir=f"mit_output/llc2160_{variable}/{variable}_llc2160_x_y_depth.idx"
var_url=base_url+base_dir

Step 2: Reading the metadata file from cloud¶

In this section, you can select any variables that you can declared in the cells above and replace it inside LoadDataset. We are just reading the metadata for the dataset here.

db=ovp.LoadDataset(var_url)
print(f'Dimensions: {db.getLogicBox()[1][0]}*{db.getLogicBox()[1][1]}*{db.getLogicBox()[1][2]}')
print(f'Total Timesteps: {len(db.getTimesteps())}')
print(f'Field: {db.getField().name}')
print('Data Type: float32')

Dimensions: 8640*6480*90
Total Timesteps: 10366
Field: salt
Data Type: float32

Step 3: Data Selection¶

This section shows you how to load the data you want. You can select any timestep, region (x,y,z) you want. You can set the quality or resolution of the data as well. Higher quality means the finer(more) data. Not setting any time means first timestep available. Not setting quality means full data which takes a while to load because of the higher filesize. Since each timestep is >30GB, I am only selecting 1 level out of 90.

data=db.db.read(time=0,z=[0,1],quality=-4) #Since each timestep is >30GB, I am only selecting 1 level out of 90.
data.shape

(1, 3240, 4320)

Step 4: Visualize the data¶

We are using a simple matplotlib here, but since the data is in numpy array, it can loaded with any python modules that support numpy. Feel free to set the vmin,vmax appropriately.

fig,axes=plt.subplots(1,1,figsize=(12,8))
im= axes.imshow(data[0,:,:], aspect='auto',origin='lower',cmap='turbo')
cbar = plt.colorbar(im, ax=axes)
cbar.set_label('Temperature (deg. C)')
plt.show()

But, what if you want to see the full data for a certain region at a certain depth?¶

Just set the right x,y,z while reading the data. x and y are the bounding box, z is the depth/layer.

data1=db.db.read(time=1,z=[0,1],quality=-6,x=[500,2500],y=[2500,5000])
plt.imshow(data1[0,:,:], origin='lower',cmap='turbo')
plt.colorbar()

data1.shape #

(1, 625, 500)

Step 5: Save the data for the region locally¶

You can save the data locally as you want. For example, here we are only saving the region shown above as a numpy array.

np.save('test_region2.npy', data1)

Step 6: Load the locally saved region and visualize using matplotlib¶

local_data=np.load('test_region2.npy')
plt.imshow(local_data[0,:,:], origin='lower',cmap='turbo')
plt.colorbar()

Step 7: Horizontal Slicing¶

data1=db.db.read(time=1,x=[500,2500],y=[5100,5101])
data1.shape

(90, 1, 2000)

plt.figure(figsize=(14,8))
plt.imshow(data1[:,0,:],cmap='turbo')
# plt.colorbar()

Reference¶

This data is a product of DYAMOND initiative. Please find more information here.

Please reach out to Aashish Panta, Giorgio Scorzelli or Valerio Pascucci for any concerns about the notebook. Thank you!¶

Aashish Panta (aashishpanta0@gmail.com)
Giorgio Scorzelli (scrgiorgio@gmail.com)
Valerio Pascucci (pascucci.valerio@gmail.com)

NASA DYAMOND Datasets (C1440-LLC2160)

DYAMOND LLC2160 Atmospheric Dataset Notebook

ECCO LLC4320 Datasets

ECCO LLC4320 Ocean Dataset Notebook