Project Pythia Logo Pangeo Logo

Along Track Altimetry Analysis


Overview

  1. Using CNES altimetry data

  2. Visualizing data using hvplot

  3. Use xhistogram to plot multidimensional data

Prerequisites

Concepts

Importance

Notes

Intro to Pandas

Helpful

Using hvplot

Helpful

Matplotlib knowledge also helpful

Dask

Helpful

xhistogram

Helpful

  • Time to learn: 15 minutes

Imports


import fsspec
import xarray as xr
import numpy as np
import hvplot
import hvplot.dask
import hvplot.pandas
import hvplot.xarray
from xhistogram.xarray import histogram
from intake import open_catalog

Load Data

The analysis ready along-track altimetry data were prepared by CNES. They are catalogged in the Pangeo Cloud Data Catalog here: https://catalog.pangeo.io/browse/master/ocean/altimetry/

We will work with Jason 3.

cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean/altimetry.yaml")
print(list(cat))
ds = cat['j3'].to_dask()
ds
['al', 'alg', 'c2', 'e1', 'e1g', 'e2', 'en', 'enn', 'g2', 'h2', 'j1', 'j1g', 'j1n', 'j2', 'j2g', 'j2n', 'j3', 's3a', 's3b', 'tp', 'tpn']
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 3
      1 cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean/altimetry.yaml")
      2 print(list(cat))
----> 3 ds = cat['j3'].to_dask()
      4 ds

File ~/miniconda3/envs/po-cookbook-dev/lib/python3.10/site-packages/intake/catalog/base.py:472, in Catalog.__getitem__(self, key)
    463 """Return a catalog entry by name.
    464 
    465 Can also use attribute syntax, like ``cat.entry_name``, or
   (...)
    468 cat['name1', 'name2']
    469 """
    470 if not isinstance(key, list) and key in self:
    471     # triggers reload_on_change
--> 472     s = self._get_entry(key)
    473     if s.container == "catalog":
    474         s.name = key

File ~/miniconda3/envs/po-cookbook-dev/lib/python3.10/site-packages/intake/catalog/utils.py:43, in reload_on_change.<locals>.wrapper(self, *args, **kwargs)
     40 @functools.wraps(f)
     41 def wrapper(self, *args, **kwargs):
     42     self.reload()
---> 43     return f(self, *args, **kwargs)

File ~/miniconda3/envs/po-cookbook-dev/lib/python3.10/site-packages/intake/catalog/base.py:355, in Catalog._get_entry(self, name)
    353 ups = [up for name, up in self.user_parameters.items() if name not in up_names]
    354 entry._user_parameters = ups + (entry._user_parameters or [])
--> 355 return entry()

File ~/miniconda3/envs/po-cookbook-dev/lib/python3.10/site-packages/intake/catalog/entry.py:60, in CatalogEntry.__call__(self, persist, **kwargs)
     58 def __call__(self, persist=None, **kwargs):
     59     """Instantiate DataSource with given user arguments"""
---> 60     s = self.get(**kwargs)
     61     s._entry = self
     62     s._passed_kwargs = list(kwargs)

File ~/miniconda3/envs/po-cookbook-dev/lib/python3.10/site-packages/intake/catalog/local.py:313, in LocalCatalogEntry.get(self, **user_parameters)
    310     return self._default_source
    312 plugin, open_args = self._create_open_args(user_parameters)
--> 313 data_source = plugin(**open_args)
    314 data_source.catalog_object = self._catalog
    315 data_source.name = self.name

TypeError: ZarrArraySource.__init__() got an unexpected keyword argument 'consolidated'

Load some data into memory:

# Select latitude, longitude, and sea level anomaly
ds_ll = ds[['latitude', 'longitude', 'sla_filtered']].reset_coords().astype('f4').load()
ds_ll
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 2
      1 # Select latitude, longitude, and sea level anomaly
----> 2 ds_ll = ds[['latitude', 'longitude', 'sla_filtered']].reset_coords().astype('f4').load()
      3 ds_ll

NameError: name 'ds' is not defined

Convert to pandas dataframe:

df = ds_ll.to_dataframe()
df
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 df = ds_ll.to_dataframe()
      2 df

NameError: name 'ds_ll' is not defined

Visualize with hvplot

df.hvplot.scatter(x='longitude', y='latitude', datashade=True)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 df.hvplot.scatter(x='longitude', y='latitude', datashade=True)

NameError: name 'df' is not defined

Bin using xhistogram

lon_bins = np.arange(0, 361, 2)
lat_bins = np.arange(-70, 71, 2)

# helps with memory management
ds_ll_chunked = ds_ll.chunk({'time': '5MB'})

sla_variance = histogram(ds_ll_chunked.longitude, ds_ll_chunked.latitude,
                         bins=[lon_bins, lat_bins],
                         weights=ds_ll_chunked.sla_filtered.fillna(0.)**2)

norm = histogram(ds_ll_chunked.longitude, ds_ll_chunked.latitude,
                         bins=[lon_bins, lat_bins])


# let's get at least 200 points in a box for it to be unmasked
thresh = 200
sla_variance = sla_variance / norm.where(norm > thresh)
sla_variance
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 5
      2 lat_bins = np.arange(-70, 71, 2)
      4 # helps with memory management
----> 5 ds_ll_chunked = ds_ll.chunk({'time': '5MB'})
      7 sla_variance = histogram(ds_ll_chunked.longitude, ds_ll_chunked.latitude,
      8                          bins=[lon_bins, lat_bins],
      9                          weights=ds_ll_chunked.sla_filtered.fillna(0.)**2)
     11 norm = histogram(ds_ll_chunked.longitude, ds_ll_chunked.latitude,
     12                          bins=[lon_bins, lat_bins])

NameError: name 'ds_ll' is not defined
sla_variance.load()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 sla_variance.load()

NameError: name 'sla_variance' is not defined
# plot the sea level anomaly variance
sla_variance.plot(x='longitude_bin', figsize=(12, 6), vmax=0.2)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 2
      1 # plot the sea level anomaly variance
----> 2 sla_variance.plot(x='longitude_bin', figsize=(12, 6), vmax=0.2)

NameError: name 'sla_variance' is not defined

Summary


In this example we visualized sea level anomalies using along-track altimetry data using hvplot. Then, we used xhistogram to calculate and plot the variance of the data.

What’s next?

Other examples will look at other datasets to visualize sea surface temeratures, ocean depth, and currents.

Resources and references

  • This notebook is based on the Pangeo physical oceanography gallery example: https://gallery.pangeo.io/repos/pangeo-gallery/physical-oceanography/02_along_track.html