HEALPix logo

UXarray for Advanced HEALPix Analysis & Visualization

In this section, you’ll learn:

  • Using the uxarray package to perform advanced analysis operators over HEALPix data such as non-conservative zonal means, etc.

Prerequisites

Concepts

Importance

Notes

UXarray

Necessary

HEALPix overview

Necessary

Time to learn: 30 minutes


import cartopy.crs as ccrs
import intake
import uxarray as ux

Open data catalog

Tip

We assume that you have already gone over the previous section, UXarray for Basic HEALPix Statistics & Visualization. If not and if you need to learn about data catalogs in general and the data we will use throughout this notebook, we recommend to check that section first.

Let us open the online catalog from the WCRP’s Digital Earths Global Hackathon 2025 catalog repository using intake and read the output of the ICON run d3hp003, which is stored in the HEALPix format:

cat_url = "https://digital-earths-global-hackathon.github.io/catalog/catalog.yaml"
cat = intake.open_catalog(cat_url)
model_run = cat.online.icon_d3hp003

We can look into the highest possible resolution level allowed in this dataset at zoom level = 9 as Xarray.Dataset:

ds = model_run(zoom=9, time="P1D").to_dask()
/home/runner/miniconda3/envs/healpix-cookbook-dev/lib/python3.10/site-packages/intake_xarray/base.py:21: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  'dims': dict(self._ds.dims),

Create UXarray Datasets from HEALPix

We can use UXarray’s from_healpix API as follows to open a HEALPix grid from xarray.Dataset:

uxds = ux.UxDataset.from_healpix(ds)
uxds
<xarray.UxDataset> Size: 2TB
Dimensions:        (time: 425, n_face: 3145728, crs: 1, pressure: 30,
                    soil_level: 5, pressure_rva: 3)
Coordinates:
  * crs            (crs) float32 4B nan
  * pressure       (pressure) int64 240B 5 10 20 50 ... 92500 95000 97500 100000
  * pressure_rva   (pressure_rva) int64 24B 16 18 23
  * soil_level     (soil_level) int64 40B 0 0 0 2 6
  * time           (time) datetime64[ns] 3kB 2020-01-02 ... 2021-03-01
Dimensions without coordinates: n_face
Data variables: (12/58)
    clivi          (time, n_face) float32 5GB ...
    clt            (time, n_face) float32 5GB ...
    clwvi          (time, n_face) float32 5GB ...
    egpvi          (time, n_face) float32 5GB ...
    einvi          (time, n_face) float32 5GB ...
    ekhvi          (time, n_face) float32 5GB ...
    ...             ...
    ua             (time, pressure, n_face) float32 160GB ...
    uas            (time, n_face) float32 5GB ...
    va             (time, pressure, n_face) float32 160GB ...
    vas            (time, n_face) float32 5GB ...
    wa             (time, pressure, n_face) float32 160GB ...
    zg             (time, pressure, n_face) float32 160GB ...

Data variable of interest

Then let us pick a variable, the surface temperature, from the dataset, which will give us an uxarray.UxDataArray:

uxda = uxds["ts"]
uxda
<xarray.UxDataArray 'ts' (time: 425, n_face: 3145728)> Size: 5GB
[1336934400 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 3kB 2020-01-02 2020-01-03 ... 2021-03-01
Dimensions without coordinates: n_face
Attributes:
    grid_mapping:        crs
    hiopy::enable:       True
    hiopy::nnn:          4
    hiopy::time_method:  mean
    long_name:           surface temperature
    short_name:          
    standard_name:       surface_temperature
    units:               K

Global mean and plot

Computing the global surface temperature mean (at the first timestep) and also having a quick plot of it would be a good idea to have as references to compare the upcoming analyses & visualizations to them:

print(
    "Global surface temperature average on ", uxda.time[0].values, ": ", uxda.isel(time=0).mean().values, " K"
)
Global surface temperature average on  2020-01-02T00:00:00.000000000 :  286.99234  K
%%time

projection = ccrs.Robinson()

uxda.isel(time=0).plot(
    projection=projection,
    cmap="inferno",
    features=["borders", "coastline"],
    title="Global surface temperature (Polygon raster)",
    width=700,
)
CPU times: user 16.9 s, sys: 935 ms, total: 17.8 s
Wall time: 17.1 s
WARNING:param.GeoOverlayPlot00477: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.

Rasterized point plots

When working with a higher-resolution dataset at a global scale, it’s not always practical to render each cell as a polygon. Instead, we can rasterize the center of each pixel.

%%time
projection = ccrs.Robinson()

# Controls the size of each pixel (smaller value leads to larger pixels)
pixel_ratio = 0.5

uxda.isel(time=0).plot.points(
    projection=projection,
    rasterize=True,
    dynamic=False,
    width=1000,
    height=500,
    pixel_ratio=pixel_ratio,
    cmap="inferno",
    title=f"Global surface temperature (Point raster), pixel_ratio={pixel_ratio}",
)
CPU times: user 910 ms, sys: 230 ms, total: 1.14 s
Wall time: 3.46 s

If we decrease the size of each pixel (by setting the pixel ratio to a higher value), we can start to see missing values, which is due to a lower density of points near the poles, leading to some pixels not containing any of our original points.

Because of this, it’s useful to try a few pixel_ratio values and see which one works best for your given resolution.

projection = ccrs.Robinson()

# Controls the size of each pixel (smaller value leads to larger pixels)
pixel_ratio = 2.0

uxda.isel(time=0).plot.points(
    projection=projection,
    rasterize=True,
    dynamic=False,
    width=1000,
    height=500,
    pixel_ratio=pixel_ratio,
    cmap="inferno",
    title=f"Global surface temperature (Point raster) with a bad pixel size selection, pixel_ratio={pixel_ratio}",
)

Cross-sections

We can look at constant latitude/longitude cross-sections of an uxarray.UxDataArray:

boulder_lat = 40.0190


# With fine resolutions like zoom level of 9, it is visually hard to observe the cross-sections,
# so we will use a zoom level of 4 for a better visualization
uxda_coarse = ux.UxDataset.from_healpix(model_run(zoom=4, time="P1D").to_dask())["ts"]
uxda_coarse.uxgrid.face_node_connectivity

uxda_lat = uxda_coarse.cross_section.constant_latitude(boulder_lat)
uxda_lat
/home/runner/miniconda3/envs/healpix-cookbook-dev/lib/python3.10/site-packages/intake_xarray/base.py:21: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  'dims': dict(self._ds.dims),
<xarray.UxDataArray 'ts' (time: 425, n_face: 128)> Size: 218kB
[54400 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 3kB 2020-01-02 2020-01-03 ... 2021-03-01
Dimensions without coordinates: n_face
Attributes:
    grid_mapping:        crs
    hiopy::enable:       True
    hiopy::nnn:          4
    hiopy::time_method:  mean
    long_name:           surface temperature
    short_name:          
    standard_name:       surface_temperature
    units:               K
import geoviews.feature as gf

uxda_lat.isel(time=0).plot(
    rasterize=False,
    projection=projection,
    global_extent=True,
    cmap="inferno",
    clim=(220, 310),
    features=["coastline"],
    title=f"Global surface temperature cross-section at {boulder_lat} degrees latitude",
    width=700,
) * gf.grid(projection=projection)
/home/runner/miniconda3/envs/healpix-cookbook-dev/lib/python3.10/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/110m_physical/ne_110m_graticules_30.zip
  warnings.warn(f'Downloading: {url}', DownloadWarning)
WARNING:param.GeoOverlayPlot01107: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.

Let’s also look at the mean of the cross-section:

print(
    f"Mean at {boulder_lat} degrees lat (Boulder, CO, USA): {uxda_lat.mean().values} K"
)
Mean at 40.019 degrees lat (Boulder, CO, USA): 286.452880859375 K

Latitude interval

uxda_lat_interval = uxda_coarse.cross_section.constant_latitude_interval(
    [boulder_lat - 15, boulder_lat + 15]
)
uxda_lat_interval.isel(time=0).plot(
    rasterize=False,
    projection=projection,
    global_extent=True,
    cmap="inferno",
    clim=(220, 310),
    features=["coastline"],
    title=f"Global surface temperature cross-section at the latitude interval [{boulder_lat-5},{boulder_lat+5}] degrees",
    width=700,
) * gf.grid(projection=projection)
WARNING:param.GeoOverlayPlot01331: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.
print(
    f"Mean at the latitude interval of [{boulder_lat-5},{boulder_lat+5}] degrees (-/+15 degrees Boulder, CO, USA): {uxda_lat_interval.mean().values} K"
)
Mean at the latitude interval of [35.019,45.019] degrees (-/+15 degrees Boulder, CO, USA): 286.20574951171875 K

Non-conservative zonal mean

Calculating the zonal mean is easy by providing the latitude range between -90 and 90 degrees with a step size in degrees:

%%time
zonal_mean_ts = uxda.isel(time=0).zonal_mean(lat=(-90, 90, 5))
CPU times: user 1min 4s, sys: 459 ms, total: 1min 5s
Wall time: 27.2 s
(
    uxda.isel(time=0).plot(
        cmap="inferno",
        # periodic_elements="split",
        height=300,
        width=600,
        colorbar=False,
        ylim=(-90, 90),
    )
    + zonal_mean_ts.plot.line(
        x="ts_zonal_mean",
        y="latitudes",
        height=300,
        width=180,
        ylabel="",
        ylim=(-90, 90),
        xlim=(220, 310),
        # xticks=[220, 250, 280, 310],
        yticks=[-90, -45, 0, 45, 90],
        grid=True,
    )
).opts(title="Temperature and its Zonal means at every 5 degree latitude")

Remapping

Now, we will be looking into one of many possible use cases where remapping would be helpful.

The data set we have been using in this section so far belongs to the newer ICON simulation, icon_d3hp003, while there is an older simulation as well, icon_ngc4008, in the same catalog. They are both stored in the HEALPix format in this case, but for most of the model intercomparison workflows in general, they might not be even so. UXarray would still be helpful to remap of those model outputs to other and then make comparisons since it can support several most commonly used unstructrued grid formats.

In this particular case, we still have some use for UXarray’s remapping such that the newer simulation has the zoom = 9 as the maximum available resolution, while the older one has zoom = 10 available. Unfortunately at zoom = 10 though, there is no actual data simulated for ts, the surface temperature. If there was, we could remap the newer simulation’s output to that one, so we could have both of them at zoom = 10, and then we could look into the difference between them for instance. Let’s pretend the highest zoom-level in the newer data is zoom = 8 then, and remap that one into the older simulation’s grid.

Let’s start with opening the older simulation run first:

model_run_older = cat.online.icon_ngc4008

ds_older = model_run_older(zoom=8, time="P1D").to_dask()

uxds_older = ux.UxDataset.from_healpix(ds_older)
uxds_older
/home/runner/miniconda3/envs/healpix-cookbook-dev/lib/python3.10/site-packages/intake_xarray/base.py:21: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  'dims': dict(self._ds.dims),
<xarray.UxDataset> Size: 58TB
Dimensions:                              (time: 10958, depth_half: 73,
                                          n_face: 786432, level_full: 90,
                                          crs: 1, depth_full: 72,
                                          soil_depth_water_level: 5,
                                          level_half: 91,
                                          soil_depth_energy_level: 5)
Coordinates:
  * crs                                  (crs) float32 4B nan
  * depth_full                           (depth_full) float32 288B 1.0 ... 5....
  * depth_half                           (depth_half) float32 292B 0.0 ... 5....
  * level_full                           (level_full) int32 360B 1 2 3 ... 89 90
  * level_half                           (level_half) int32 364B 1 2 3 ... 90 91
  * soil_depth_energy_level              (soil_depth_energy_level) float32 20B ...
  * soil_depth_water_level               (soil_depth_water_level) float32 20B ...
  * time                                 (time) datetime64[ns] 88kB 2020-01-0...
Dimensions without coordinates: n_face
Data variables: (12/103)
    A_tracer_v_to                        (time, depth_half, n_face) float32 3TB ...
    FrshFlux_IceSalt                     (time, n_face) float32 34GB ...
    FrshFlux_TotalIce                    (time, n_face) float32 34GB ...
    Qbot                                 (time, n_face) float32 34GB ...
    Qtop                                 (time, n_face) float32 34GB ...
    Wind_Speed_10m                       (time, n_face) float32 34GB ...
    ...                                   ...
    vas                                  (time, n_face) float32 34GB ...
    w                                    (time, depth_half, n_face) float32 3TB ...
    wa_phy                               (time, level_half, n_face) float32 3TB ...
    zg                                   (level_full, n_face) float32 283MB ...
    zghalf                               (level_half, n_face) float32 286MB ...
    zos                                  (time, n_face) float32 34GB ...

Plot the older simulation for reference

Let’s have a quick look at how the global surface temperature looks like in the older simulation’s output:

uxds_older["ts"].isel(time=0).plot(
    projection=projection,
    cmap="inferno",
    features=["borders", "coastline"],
    title="Global surface temperature (zoom = 8) - Older Simulation",
    width=700,
)
WARNING:param.GeoOverlayPlot01861: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.

Visually there does not seem to be a huge difference between this and the newer simulation’s output we had plotted in the very beginning.

Remap the old simulation output to the newer one

Let’s start remapping! For that, we will use the uxgrid of the newer simulation output as the destination grid and use an inverse distance weighted implementation:

%%time
uxda_older_remapped = uxds_older["ts"].isel(time=0).remap.inverse_distance_weighted(
    uxds.uxgrid, k=3, remap_to="face centers", coord_type="cartesian"
)
/home/runner/miniconda3/envs/healpix-cookbook-dev/lib/python3.10/site-packages/uxarray/grid/coordinates.py:254: UserWarning: This cannot be guaranteed to work correctly on concave polygons
  warnings.warn("This cannot be guaranteed to work correctly on concave polygons")
CPU times: user 10.8 s, sys: 217 ms, total: 11 s
Wall time: 14.8 s

Plot the difference between the old and newer simulations

Now that we have the older and newer model outputs on the same grid, let’s look at the surface temperature differences between the two:

(uxda.isel(time=0) - uxda_older_remapped).plot(
    projection=projection,
    cmap="RdBu_r",
    features=["borders", "coastline"],
    title="Global surface temperature difference between older and newer simulations (zoom = 9)",
    clim=(-25,25),
    width=700,
)
WARNING:param.GeoOverlayPlot02186: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.