Visualization at Scale


When working with large datasets, it’s crucial to consider the performance of each visualization method. This notebook investigates the performance and data fidelity of the previously covered methods using high-resolution grids.

Benchmarked Dataset

The dataset used for the timings and examples in this notebook were provided curtesy of the DYAMOND Initiative.

There are four datasets, each from the same experiment, but with different grid resolutions. The table below summarizes the scale of these datasets:

Element / Resolution

30km

15km

7.5km

3.75km

Faces

655,362

2,621,442

10,485,762

41,943,042

Nodes

1,310,720

5,242,880

20,971,520

83,886,080

Data Processing Timings

Timings were taken on a single NCAR Derecho Node. All results are in seconds.

Initial Run

Visualization Method / Grid Resolution

30km

15km

7.5km

3.75km

Polygon Raster (Including Antimeridian)

28.5

122.2

463

1990

Polygon Raster (Excluding Antimeridian)

1.69 (0.23)

5.96 (0.09)

23.1 (0.52)

93 (1.01)

Point Raster

0.13 (0.03)

0.16 (0.01)

0.35 (0.00)

1.08 (0.07)

We can see that Point Rasters are quickest, averaging about 86 times faster than Polygon Rasters (Excluding Antimeridian Polygons).

Both polygon methods scale linearly with an increase in resolution. A doubling in resolution leads to about a 4x increase in the number of polygons (a.k.a polygons), which is also observed in the timings.

Including antimeridian polygons leads to about a 20x slowdown across all resolutions, so it’s suggested to keep exclude_antimeridian=True when working with larger datasets.

Subsequent Runs

Visualization Method / Grid Resolution

30km

15km

7.5km

3.75km

Polygon Raster (Including Antimeridian)

0.31 (0.00)

1.32 (0.31)

3.85 (0.06)

14.36 (0.13)

Polygon Raster (Excluding Antimeridian)

0.30 (0.00)

1.02 (0.36)

3.46 (0.01)

13.60 (0.08)

Point Raster

0.13 (0.03)

0.16 (0.01)

0.35 (0.00)

1.08 (0.07)

For subsequent runs (i.e. we have already run one plotting instance, which computes and caches the necessary data structures), performance for both Polygon methods is essentially identical.

There is no caching currently implemented for Point Rasters, so the performance for each run is consistent with the initial run.

Important!

The timings above benchmark the data processing time (i.e. total time needed to transform an unstructured grid into a format that is able to be rendered). Actual visualization times will vary depending on choices in parameters.

A detailed benchmark of total visualization times (i.e. data processing and rendering to screen) will be added to this notebook in the future.

Polygon vs Point Rasters

Both the Polygon and Point notebooks showed off how these elements could be rasterizer.

Global

For certain visualization workflows, one may only be interested in observing the global trends of a data variable.

uxds['relhum_200hPa'][0].plot.rasterize(method='polygon', 
                                        width=1000, height=500, exclude_antimeridian=True, 
                                        clim=clim, 
                                        title="Global Polygon Raster")

Polygon Raster

uxds['relhum_200hPa'][0].plot.rasterize(method='point', 
                                        width=1000, 
                                        height=500, 
                                        clim=clim, 
                                        title="Global Point Raster")

Point Raster

We can see that both the Polygon and Point Rasters capture the global trend of our data variable.

Regional

However, it’s also common to zoom into a region of interested (i.e. Continental United States, Europe, etc.) to observe how a data variable acts on these more refined regions.

uxds['relhum_200hPa'][0].plot.rasterize(method='polygon', 
                                        width=1000, 
                                        height=500, 
                                        exclude_antimeridian=True, 
                                        dynamic=False, 
                                        xlim=(-68, -60), 
                                        ylim=(-71, -66), 
                                        clim=clim, 
                                        title="Regional Polygon Raster")

Polygon Raster

uxds['relhum_200hPa'][0].plot.rasterize(method='point', 
                                        width=1000, 
                                        height=500, 
                                        dynamic=False, 
                                        xlim=(-68, -60), 
                                        ylim=(-71, -66), 
                                        clim=clim, 
                                        title="Regional Point Raster")

Point Raster

Without specifying any additional parameters, both the Polygon and Point rasters look identical.

However, setting the parameter dynamic=True, which dynamically performs the rasterization operations as we zoom and pan across a plot, we can start to see the differences between both types of plots.

uxds['relhum_200hPa'][0].plot.rasterize(method='polygon', 
                                        width=1000, 
                                        height=500, 
                                        exclude_antimeridian=True, 
                                        dynamic=True, xlim=(-68, -60), 
                                        ylim=(-71, -66), 
                                        clim=clim, 
                                        title="Regional Polygon Raster (Dynamic)")

Polygon Raster

uxds['relhum_200hPa'][0].plot.rasterize(method='point', 
                                        width=1000, 
                                        height=500, 
                                        dynamic=True, xlim=(-68, -60), 
                                        ylim=(-71, -66), 
                                        clim=clim, 
                                        title="Regional Point Raster (Dynamic)")

Point Raster

The Polygon Raster can be zoomed in indefinitely, which is due to each polygon covering a region of our screen.

However, zooming in to our Point Rasters exposes how each point is still simply a pair of latitude and longitude coordinates, without any sense of area. After a certain point, there isn’t enough points to sample into a uniform looking raster image, and we are left with an approximation the individual points.

Zooming in even further with the Polygon Raster, we can start to see each individual cell, even at such a high resolution.

Polygon Raster