Overview¶
This notebook will demonstrate how to use Kerchunk with hvPlot and Datashader to lazily visualize a reference dataset in a streaming fashion.
We will be building off the references generated through the notebook content from thePangeo_Forge notebook, so it’s encouraged you first go through that.
Prerequisites¶
| Concepts | Importance | Notes | 
|---|---|---|
| Kerchunk Basics | Required | Core | 
| Introduction to Xarray | Required | IO | 
| Introduction to hvPlot | Required | Data Visualization | 
| Introduction to Datashader | Required | Big Data Visualization | 
Time to learn: 10 minutes
Motivation¶
Using Kerchunk, we don’t have to create a copy of the data--instead we create a collection of reference files, so that the original data files can be read as if they were Zarr.
This enables visualization on-the-fly; simply pass in the URL to the dataset and use hvplot.
Getting to Know The Data¶
gridMET is a high-resolution daily meteorological dataset covering CONUS from 1979-2023. It is produced by the Climatology Lab at UC Merced. In this example, we are going to look create a virtual Zarr dataset of a derived variable, Burn Index.
Imports¶
import hvplot.xarray  # noqa
import xarray as xrOpening the Kerchunk Dataset¶
Now, it’s a matter of opening the Kerchunk dataset and calling hvplot with the rasterize=True keyword argument.
If you’re running this notebook locally, try zooming around the map by hovering over the plot and scrolling; it should update fairly quickly. Note, it will not update if you’re viewing this on the docs page online as there is no backend server, but don’t fret because there’s a demo GIF below!
%%timeit -r 1 -n 1
storage_options = {
    "remote_protocol": "http",
    "skip_instance_cache": True,
}  # options passed to fsspec
open_dataset_options = {"chunks": {}, "decode_coords": "all"}  # opens passed to xarray
ds_kerchunk = xr.open_dataset(
    "references/Pangeo_Forge/reference.json",
    engine="kerchunk",
    storage_options=storage_options,
    **open_dataset_options,
)
display(ds_kerchunk.hvplot("lon", "lat", rasterize=True))  # noqa
Comparing Against THREDDS¶
Now, we will be repeating the previous cell, but with THREDDS.
Note how the initial load is longer.
If you’re running the notebook locally (or a demo GIF below), zooming in/out also takes longer to finish buffering as well.
%%timeit -r 1 -n 1
def url_gen(year):
    return (
        f"http://thredds.northwestknowledge.net:8080/thredds/dodsC/MET/bi/bi_{year}.nc"
    )
years = list(range(1979, 1980))
urls_list = [url_gen(year) for year in years]
netcdf_ds = xr.open_mfdataset(urls_list, engine="netcdf4")
display(netcdf_ds.hvplot("lon", "lat", rasterize=True))  # noqa