Skip to article frontmatterSkip to article content

Use xrefcoord to Generate Coordinates

Use xrefcoord to Generate Coordinates

When using Kerchunk to generate reference datasets for GeoTIFF’s, only the dimensions are preserved. xrefcoord is a small utility that allows us to generate coordinates for these reference datasets using the geospatial metadata. Similar to other accessor add-on libraries for Xarray such as rioxarray and xwrf, xrefcord provides an accessor for an Xarray dataset. Importing xrefcoord allows us to use the .xref accessor to access additional methods.

In this tutorial we will use the generate_coords method to build coordinates for the Xarray dataset. xrefcoord is very experimental and makes assumptions about the underlying data, such as each variable shares the same dimensions etc. Use with caution!

Overview

Within this notebook, we will cover:

  1. How to load a Kerchunk reference dataset created from a collection of GeoTIFFs
  2. How to use xrefcoord to generate coordinates from a GeoTIFF reference dataset

Prerequisites

ConceptsImportanceNotes
Kerchunk BasicsRequiredCore
Xarray TutorialRequiredCore
  • Time to learn: 45 minutes

import xarray as xr
import xrefcoord  # noqa

storage_options = {
    "remote_protocol": "s3",
    "skip_instance_cache": True,
    "remote_options": {"anon": True}
}  # options passed to fsspec
open_dataset_options = {"chunks": {}}  # opens passed to xarray

ds = xr.open_dataset(
    "references/RADAR.json",
    engine="kerchunk",
    storage_options=storage_options,
    open_dataset_options=open_dataset_options,
)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 11
      4 storage_options = {
      5     "remote_protocol": "s3",
      6     "skip_instance_cache": True,
      7     "remote_options": {"anon": True}
      8 }  # options passed to fsspec
      9 open_dataset_options = {"chunks": {}}  # opens passed to xarray
---> 11 ds = xr.open_dataset(
     12     "references/RADAR.json",
     13     engine="kerchunk",
     14     storage_options=storage_options,
     15     open_dataset_options=open_dataset_options,
     16 )

File ~/micromamba/envs/kerchunk-cookbook/lib/python3.13/site-packages/xarray/backends/api.py:687, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    675 decoders = _resolve_decoders_kwargs(
    676     decode_cf,
    677     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)    683     decode_coords=decode_coords,
    684 )
    686 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 687 backend_ds = backend.open_dataset(
    688     filename_or_obj,
    689     drop_variables=drop_variables,
    690     **decoders,
    691     **kwargs,
    692 )
    693 ds = _dataset_from_backend_dataset(
    694     backend_ds,
    695     filename_or_obj,
   (...)    705     **kwargs,
    706 )
    707 return ds

File ~/micromamba/envs/kerchunk-cookbook/lib/python3.13/site-packages/kerchunk/xarray_backend.py:13, in KerchunkBackend.open_dataset(self, filename_or_obj, storage_options, open_dataset_options, **kw)
      9 def open_dataset(
     10     self, filename_or_obj, *, storage_options=None, open_dataset_options=None, **kw
     11 ):
     12     open_dataset_options = (open_dataset_options or {}) | kw
---> 13     ref_ds = open_reference_dataset(
     14         filename_or_obj,
     15         storage_options=storage_options,
     16         open_dataset_options=open_dataset_options,
     17     )
     18     return ref_ds

File ~/micromamba/envs/kerchunk-cookbook/lib/python3.13/site-packages/kerchunk/xarray_backend.py:45, in open_reference_dataset(filename_or_obj, storage_options, open_dataset_options)
     42 if open_dataset_options is None:
     43     open_dataset_options = {}
---> 45 store = refs_as_store(filename_or_obj, **storage_options)
     47 return xr.open_zarr(
     48     store, zarr_format=2, consolidated=False, **open_dataset_options
     49 )

TypeError: refs_as_store() got an unexpected keyword argument 'skip_instance_cache'
# Generate coordinates from reference dataset
ref_ds = ds.xref.generate_coords(time_dim_name="time", x_dim_name="X", y_dim_name="Y")
# Rename to rain accumulation in 24 hour period
ref_ds = ref_ds.rename({"0": "rr24h"})

Create a Map

Here we are using Xarray to select a single time slice and create a map of 24 hour accumulated rainfall.

ref_ds["rr24h"].where(ref_ds.rr24h < 60000).isel(time=0).plot(robust=True)

Create a Time-Series

Next we are plotting accumulated rain as a function of time for a specific point.

ref_ds["rr24h"][:, 700, 700].plot()