Sea Surface Altimetry Data Analysis

For this example we will use gridded sea-surface altimetry data from The Copernicus Marine Environment. This is a widely used dataset in physical oceanography and climate.

The dataset has been extracted from Copernicus and stored in google cloud storage in xarray-zarr format. It is catalogues in the Pangeo Cloud Catalog at https://catalog.pangeo.io/browse/master/ocean/sea_surface_height/

import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
import hvplot.xarray
plt.rcParams['figure.figsize'] = (15,10)
%matplotlib inline

Initialize Dataset

Here we load the dataset from the zarr store. Note that this very large dataset initializes nearly instantly, and we can see the full list of variables and coordinates, including metadata for each variable.

from intake import open_catalog
cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean.yaml")
ds  = cat["sea_surface_height"].to_dask()
ds

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 3
      1 from intake import open_catalog
      2 cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean.yaml")
----> 3 ds  = cat["sea_surface_height"].to_dask()
      4 ds

File ~/miniconda3/envs/po-cookbook-dev/lib/python3.10/site-packages/intake/catalog/base.py:472, in Catalog.__getitem__(self, key)
    463 """Return a catalog entry by name.
    464 
    465 Can also use attribute syntax, like ``cat.entry_name``, or
   (...)
    468 cat['name1', 'name2']
    469 """
    470 if not isinstance(key, list) and key in self:
    471     # triggers reload_on_change
--> 472     s = self._get_entry(key)
    473     if s.container == "catalog":
    474         s.name = key

File ~/miniconda3/envs/po-cookbook-dev/lib/python3.10/site-packages/intake/catalog/utils.py:43, in reload_on_change.<locals>.wrapper(self, *args, **kwargs)
     40 @functools.wraps(f)
     41 def wrapper(self, *args, **kwargs):
     42     self.reload()
---> 43     return f(self, *args, **kwargs)

File ~/miniconda3/envs/po-cookbook-dev/lib/python3.10/site-packages/intake/catalog/base.py:355, in Catalog._get_entry(self, name)
    353 ups = [up for name, up in self.user_parameters.items() if name not in up_names]
    354 entry._user_parameters = ups + (entry._user_parameters or [])
--> 355 return entry()

File ~/miniconda3/envs/po-cookbook-dev/lib/python3.10/site-packages/intake/catalog/entry.py:60, in CatalogEntry.__call__(self, persist, **kwargs)
     58 def __call__(self, persist=None, **kwargs):
     59     """Instantiate DataSource with given user arguments"""
---> 60     s = self.get(**kwargs)
     61     s._entry = self
     62     s._passed_kwargs = list(kwargs)

File ~/miniconda3/envs/po-cookbook-dev/lib/python3.10/site-packages/intake/catalog/local.py:313, in LocalCatalogEntry.get(self, **user_parameters)
    310     return self._default_source
    312 plugin, open_args = self._create_open_args(user_parameters)
--> 313 data_source = plugin(**open_args)
    314 data_source.catalog_object = self._catalog
    315 data_source.name = self.name

TypeError: ZarrArraySource.__init__() got an unexpected keyword argument 'consolidated'

Visually Examine Some of the Data

Let’s do a sanity check that the data looks reasonable. Here we use the hvplot interactive plotting library.

ds.sla.hvplot.image('longitude', 'latitude',
                    rasterize=True, dynamic=True, width=800, height=450, 
                    widget_type='scrubber', widget_location='bottom', cmap='RdBu_r')

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 ds.sla.hvplot.image('longitude', 'latitude',
      2                     rasterize=True, dynamic=True, width=800, height=450, 
      3                     widget_type='scrubber', widget_location='bottom', cmap='RdBu_r')

NameError: name 'ds' is not defined

Create and Connect to Dask Distributed Cluster

from dask_gateway import Gateway
from dask.distributed import Client

gateway = Gateway()
cluster = gateway.new_cluster()
cluster.adapt(minimum=1, maximum=20)
cluster

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 4
      1 from dask_gateway import Gateway
      2 from dask.distributed import Client
----> 4 gateway = Gateway()
      5 cluster = gateway.new_cluster()
      6 cluster.adapt(minimum=1, maximum=20)

File ~/miniconda3/envs/po-cookbook-dev/lib/python3.10/site-packages/dask_gateway/client.py:282, in Gateway.__init__(self, address, proxy_address, public_address, auth, asynchronous, loop)
    280     address = format_template(dask.config.get("gateway.address"))
    281 if address is None:
--> 282     raise ValueError(
    283         "No dask-gateway address provided or found in configuration"
    284     )
    285 address = address.rstrip("/")
    287 if public_address is None:

ValueError: No dask-gateway address provided or found in configuration

** ☝️ Don’t forget to click the link above to view the scheduler dashboard! **

client = Client(cluster)
client

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 client = Client(cluster)
      2 client

NameError: name 'cluster' is not defined

Timeseries of Global Mean Sea Level

Here we make a simple yet fundamental calculation: the rate of increase of global mean sea level over the observational period.

# the number of GB involved in the reduction
ds.sla.nbytes/1e9

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 2
      1 # the number of GB involved in the reduction
----> 2 ds.sla.nbytes/1e9

NameError: name 'ds' is not defined

# the computationally intensive step
sla_timeseries = ds.sla.mean(dim=('latitude', 'longitude')).load()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 2
      1 # the computationally intensive step
----> 2 sla_timeseries = ds.sla.mean(dim=('latitude', 'longitude')).load()

NameError: name 'ds' is not defined

sla_timeseries.plot(label='full data')
sla_timeseries.rolling(time=365, center=True).mean().plot(label='rolling annual mean')
plt.ylabel('Sea Level Anomaly [m]')
plt.title('Global Mean Sea Level')
plt.legend()
plt.grid()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 sla_timeseries.plot(label='full data')
      2 sla_timeseries.rolling(time=365, center=True).mean().plot(label='rolling annual mean')
      3 plt.ylabel('Sea Level Anomaly [m]')

NameError: name 'sla_timeseries' is not defined

In order to understand how the sea level rise is distributed in latitude, we can make a sort of Hovmöller diagram.

sla_hov = ds.sla.mean(dim='longitude').load()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 sla_hov = ds.sla.mean(dim='longitude').load()

NameError: name 'ds' is not defined

fig, ax = plt.subplots(figsize=(12, 4))
sla_hov.name = 'Sea Level Anomaly [m]'
sla_hov.transpose().plot(vmax=0.2, ax=ax)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 2
      1 fig, ax = plt.subplots(figsize=(12, 4))
----> 2 sla_hov.name = 'Sea Level Anomaly [m]'
      3 sla_hov.transpose().plot(vmax=0.2, ax=ax)

NameError: name 'sla_hov' is not defined

../_images/e97d0647e265dae80e7e338d2511e96780a4a9b4a7222392c38cbd19d42f566e.png

We can see that most sea level rise is actually in the Southern Hemisphere.

Sea Level Variability

We can examine the natural variability in sea level by looking at its standard deviation in time.

sla_std = ds.sla.std(dim='time').load()
sla_std.name = 'Sea Level Variability [m]'

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 sla_std = ds.sla.std(dim='time').load()
      2 sla_std.name = 'Sea Level Variability [m]'

NameError: name 'ds' is not defined

ax = sla_std.plot()
_ = plt.title('Sea Level Variability')

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 ax = sla_std.plot()
      2 _ = plt.title('Sea Level Variability')

NameError: name 'sla_std' is not defined