Landsat8

Preprocessing - Regrid


Overview

The last notebook demonstrated how to access the data with some common alternative approaches to the Planetary Computer platform. The next step is to ensure that the data has been appropriately prepared for whatever analysis we intend to run.

For example, as we may be directly comparing data across time points, it would be very useful if these images were on the same underlying coordinate grid. In this notebook, we will preprocess the data to reshape and align our images for efficient consumption by analysis steps. We will also see how to use interactive visualization to assist in our processing pipeline. Note, if you are adopting the data access approach in the Planetary Computer notebook, you may not need to regrid the data and can proceed to an analysis workflow.

Prerequisites

Concepts

Importance

Notes

Data Ingestion - Intake

Necessary

Coordinate Reference Systems and EPSG

Helpful

GeoViews resampling grids

Helpful

  • Time to learn: 20 minutes.


Imports

import intake
import numpy as np
import xarray as xr
import cartopy.crs as ccrs
import geoviews as gv
import hvplot.xarray
import holoviews as hv

import warnings
warnings.simplefilter('ignore', FutureWarning) # Ignore a warning about the format of epsg codes
/home/runner/miniconda3/envs/cookbook-dev/lib/python3.10/site-packages/dask/dataframe/_pyarrow_compat.py:17: FutureWarning: Minimal version of pyarrow will soon be increased to 14.0.1. You are using 12.0.1. Please consider upgrading.
  warnings.warn(
/home/runner/miniconda3/envs/cookbook-dev/lib/python3.10/site-packages/geoviews/operation/__init__.py:14: HoloviewsDeprecationWarning: 'ResamplingOperation' is deprecated and will be removed in version 1.18, use 'ResampleOperation2D' instead.
  from holoviews.operation.datashader import (

Loading data

You have already read through the previous tutorial on loading data with Intake, so now we’ll jump right to using Intake to read in chunks of small versions of the Landsat 5 and Landsat 8 data.

cat = intake.open_catalog('./data/catalog.yml')
cat.landsat_8_small.container
'xarray'

We know that our Landsat data will ultimately be an xarray object. Let’s load the metadata with Dask:

landsat_5_da = cat.landsat_5_small.to_dask()
landsat_8_da = cat.landsat_8_small.to_dask()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 landsat_5_da = cat.landsat_5_small.to_dask()
      2 landsat_8_da = cat.landsat_8_small.to_dask()

File ~/miniconda3/envs/cookbook-dev/lib/python3.10/site-packages/intake_xarray/base.py:69, in DataSourceMixin.to_dask(self)
     67 def to_dask(self):
     68     """Return xarray object where variables are dask arrays"""
---> 69     return self.read_chunked()

File ~/miniconda3/envs/cookbook-dev/lib/python3.10/site-packages/intake_xarray/base.py:44, in DataSourceMixin.read_chunked(self)
     42 def read_chunked(self):
     43     """Return xarray object (which will have chunks)"""
---> 44     self._load_metadata()
     45     return self._ds

File ~/miniconda3/envs/cookbook-dev/lib/python3.10/site-packages/intake/source/base.py:84, in DataSourceBase._load_metadata(self)
     82 """load metadata only if needed"""
     83 if self._schema is None:
---> 84     self._schema = self._get_schema()
     85     self.dtype = self._schema.dtype
     86     self.shape = self._schema.shape

File ~/miniconda3/envs/cookbook-dev/lib/python3.10/site-packages/intake_xarray/raster.py:102, in RasterIOSource._get_schema(self)
     99 self.urlpath, *_ = self._get_cache(self.urlpath)
    101 if self._ds is None:
--> 102     self._open_dataset()
    104     ds2 = xr.Dataset({'raster': self._ds})
    105     metadata = {
    106         'dims': dict(ds2.dims),
    107         'data_vars': {k: list(ds2[k].coords)
   (...)
    110         'array': 'raster'
    111     }

File ~/miniconda3/envs/cookbook-dev/lib/python3.10/site-packages/intake_xarray/raster.py:90, in RasterIOSource._open_dataset(self)
     88     self._ds = self._open_files(files)
     89 else:
---> 90     self._ds = xr.open_rasterio(files, chunks=self.chunks,
     91                                 **self._kwargs)

AttributeError: module 'xarray' has no attribute 'open_rasterio'
landsat_8_da
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 landsat_8_da

NameError: name 'landsat_8_da' is not defined

Calculate the Vegetation Index

To motivate our preprocessing, let’s conduct a simple analysis on the Normalized Difference Vegetation Index NDVI for each of these image sets as follows:

(1)\[\begin{align} {NDVI} & = \frac{NIR-Red}{NIR+Red} \end{align}\]

where Red and NIR stand for the spectral reflectance measurements acquired in the Red (visible) and near-infrared regions, respectively. Note that in Landsat 5 the Red and NIR bands are stored in bands 3 and 4 respectively whereas in Landsat 8 the Red and NIR are bands 4 and 5.

with xr.set_options(keep_attrs=True):
    NDVI_1988 = (landsat_5_da.sel(band=4) - landsat_5_da.sel(band=3)) / (landsat_5_da.sel(band=4) + landsat_5_da.sel(band=3))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 2
      1 with xr.set_options(keep_attrs=True):
----> 2     NDVI_1988 = (landsat_5_da.sel(band=4) - landsat_5_da.sel(band=3)) / (landsat_5_da.sel(band=4) + landsat_5_da.sel(band=3))

NameError: name 'landsat_5_da' is not defined

Info

Above, we are using a context manager “with xr.set_options(keep_attrs=True):” to retain the array’s attributes through the operations. That is, we want all the metadata like ‘crs’ to stay with our result so we can use ‘geo=True’ in our plotting.

NDVI_1988
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 NDVI_1988

NameError: name 'NDVI_1988' is not defined
NDVI_1988_plot = NDVI_1988.hvplot.image(x='x', y='y', geo=True, clim=(-1,1), title='NDVI 1988', rot=45, cmap='viridis')
NDVI_1988_plot
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 NDVI_1988_plot = NDVI_1988.hvplot.image(x='x', y='y', geo=True, clim=(-1,1), title='NDVI 1988', rot=45, cmap='viridis')
      2 NDVI_1988_plot

NameError: name 'NDVI_1988' is not defined
with xr.set_options(keep_attrs=True):
    NDVI_2017 = (landsat_8_da.sel(band=5) - landsat_8_da.sel(band=4)) / (landsat_8_da.sel(band=5) + landsat_8_da.sel(band=4))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 2
      1 with xr.set_options(keep_attrs=True):
----> 2     NDVI_2017 = (landsat_8_da.sel(band=5) - landsat_8_da.sel(band=4)) / (landsat_8_da.sel(band=5) + landsat_8_da.sel(band=4))

NameError: name 'landsat_8_da' is not defined
NDVI_2017_plot = NDVI_2017.hvplot.image(x='x', y='y', geo=True, clim=(-1,1), title='NDVI 2017', rot=45, cmap='viridis')
NDVI_2017_plot
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 NDVI_2017_plot = NDVI_2017.hvplot.image(x='x', y='y', geo=True, clim=(-1,1), title='NDVI 2017', rot=45, cmap='viridis')
      2 NDVI_2017_plot

NameError: name 'NDVI_2017' is not defined

We can calculate the difference between these two years by subtracting one from the other.

with xr.set_options(keep_attrs=True):
    NDVI_diff = NDVI_2017 - NDVI_1988
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 2
      1 with xr.set_options(keep_attrs=True):
----> 2     NDVI_diff = NDVI_2017 - NDVI_1988

NameError: name 'NDVI_2017' is not defined
NDVI_diff_plot = NDVI_diff.hvplot.image(x='x', y='y', geo=True, cmap='coolwarm', clim=(-1,1), title='NDVI 2017 - 1988', rot=45)
NDVI_diff_plot
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 NDVI_diff_plot = NDVI_diff.hvplot.image(x='x', y='y', geo=True, cmap='coolwarm', clim=(-1,1), title='NDVI 2017 - 1988', rot=45)
      2 NDVI_diff_plot

NameError: name 'NDVI_diff' is not defined

Notice how pixelated that image looks. What is going on here? To figure it out, let’s take a look at the shape of diff.

NDVI_diff.shape
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 NDVI_diff.shape

NameError: name 'NDVI_diff' is not defined

That’s a lot smaller than our NDVI for each year. What is happening is that when we compute the difference on the data we only get values where there are values for each year in the same grid cell. Since the cells are on a different resolution this only happens once every so often. What we’d rather do is interpolate to the same grid and then do our computations on that.

Combine data from overlapping grids

These two sets of Landsat bands cover roughly the same area but were taken in 1988 and 2017. They have different resolutions, different numbers of grid cells per image, and different x and y offsets. We can see the offset by printing the first x value for each year and seeing that they are not equivalent.

print(NDVI_1988.x[0].values)
print(NDVI_2017.x[0].values)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 print(NDVI_1988.x[0].values)
      2 print(NDVI_2017.x[0].values)

NameError: name 'NDVI_1988' is not defined

We can also do a quick check of resolution by subtracting the second x value from the first x value for each year.

print((NDVI_1988.x[1] - NDVI_1988.x[0]).values)
print((NDVI_2017.x[1] - NDVI_2017.x[0]).values)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 1
----> 1 print((NDVI_1988.x[1] - NDVI_1988.x[0]).values)
      2 print((NDVI_2017.x[1] - NDVI_2017.x[0]).values)

NameError: name 'NDVI_1988' is not defined

To align the grid, resolution, and offset of these images, we will select a region of interest, create a new grid within this region, and then interpolate our images onto this common grid.

Select region of interest

The first step to selecting a Region of Interest (ROI) is to define its center point.

There are multiple ways to define this central point, and we will start with a case in which we know the coordinates in latitude, longitude and need to convert it into the CRS of our data. Alternatively, we will see how easy it is to use interactive visualization to estimate the center point without knowing any prior coordinates.

For the first approach, the first step is getting the CRS of our data. This information is stored in the attributes of our original landsat data (either .crs or .rio.crs depending on package versions). Let’s take a look at it:

try:
    print(landsat_8_da.crs)
except:
    print(landsat_8_da.rio.crs)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 2
      1 try:
----> 2     print(landsat_8_da.crs)
      3 except:

NameError: name 'landsat_8_da' is not defined

During handling of the above exception, another exception occurred:

NameError                                 Traceback (most recent call last)
Cell In[16], line 4
      2     print(landsat_8_da.crs)
      3 except:
----> 4     print(landsat_8_da.rio.crs)

NameError: name 'landsat_8_da' is not defined

This CRS is referenced by an EPSG code. We can see more about this specific code at EPSG.io. Read more about EPSG codes in general in this Coordinate Reference Systems: EPSG codes online book chapter. Note, the +init=<authority>:<code> syntax is being deprecated, and <authority>:<code> is the preferred initialization method.

We can convert this EPSG number into a cartopy.crs object using the cartopy.crs.epsg method. This will allow us to transform any point from latitude, longitude into our data’s CRS so that we can define a center for our ROI.

crs = ccrs.epsg(32611)
crs.__repr__()
'_EPSGProjection(32611)'
crs.area_of_use
AreaOfUse(west=-120.0, south=0.0, east=-114.0, north=84.0, name='Between 120°W and 114°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - Alberta; British Columbia (BC); Northwest Territories (NWT); Nunavut. Mexico. United States (USA).')
crs_box = [(crs.x_limits[0], crs.y_limits[0]), (crs.x_limits[0], crs.y_limits[1]),
           (crs.x_limits[1], crs.y_limits[1]), (crs.x_limits[1], crs.y_limits[0])]

crs_extent = gv.Polygons(crs_box, crs=crs).options(alpha=0.4, color='blue', line_alpha=0)

(gv.feature.coastline * crs_extent).options(title='EPSG:32611', global_extent=True)
/home/runner/miniconda3/envs/cookbook-dev/lib/python3.10/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/110m_physical/ne_110m_coastline.zip
  warnings.warn(f'Downloading: {url}', DownloadWarning)

Just for comparison, let’s preview the coverage of a totally different EPSG; perhaps one in the southern hemisphere.

crs2 = ccrs.epsg(32735)
crs2.area_of_use
AreaOfUse(west=24.0, south=-80.0, east=30.0, north=0.0, name='Between 24°E and 30°E, southern hemisphere between 80°S and equator, onshore and offshore. Botswana. Burundi. Democratic Republic of the Congo (Zaire). Rwanda. South Africa. Tanzania. Uganda. Zambia. Zimbabwe.')
crs_box2 = [(crs2.x_limits[0], crs2.y_limits[0]), (crs2.x_limits[0], crs2.y_limits[1]),
           (crs2.x_limits[1], crs2.y_limits[1]), (crs2.x_limits[1], crs2.y_limits[0])]

crs_extent2 = gv.Polygons(crs_box, crs=crs2).options(alpha=0.4, color='red', line_alpha=0)

(gv.feature.coastline * crs_extent2).options(title='EPSG:32735', global_extent=True)

Hopefully, this gives you some intuition for the data that we are looking at and that it comes from a small North American region within the blue highlighted area on the ‘EPSG:32611’ plot.

Now, if we know already know our center point in terms of a latitude, longitude point (from a cartopy.crs.PlateCarree() projection), we can transform this into the CRS of our data using the transform_point method:

center={'x':np.nan, 'y':np.nan}
center['x'], center['y'] = crs.transform_point(-118.7081, 38.6942, ccrs.PlateCarree())
center['x']
351453.22378616617

But what if we didn’t happen to know the center point in terms of latitude and longitude? Since we are using interactive plots with hvplot, we can look at one of our previous plots of the lake and hover over the region that we want to use as a center point to reveal the coordinates in latitude, longitude (created by plotting with geo=True).

Alternatively, we could just hover over a plot of our data that is still in our data crs projection (by using the default geo=False) to directly get our x_center and y_center. Try it below - hover over the center of the lake to reveal the x and y points:

NDVI_2017.hvplot.image(x='x', y='y', width=400, clim=(-1,1), rot=45, cmap='viridis')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[25], line 1
----> 1 NDVI_2017.hvplot.image(x='x', y='y', width=400, clim=(-1,1), rot=45, cmap='viridis')

NameError: name 'NDVI_2017' is not defined

A more advanced approach would be to update automatically assign the coordinates to a variable when we click on the image. For this, we will reach to HoloViews, the library that powers hvPlot. For more information about this functionality of HoloViews, check out the HoloViews Tap stream page.

First, we’ll create function to update our center dictionary based on clicks.

def onclick(x,y):
    center['x'] = x
    center['y'] = y

Then we’ll connect a HoloViews Tap stream to our plot

crs_plot = NDVI_2017.hvplot.image(x='x', y='y', width=400, clim=(-1,1), rot=45, cmap='viridis')

tap_stream = hv.streams.Tap(source=crs_plot)

tap_stream.add_subscriber(onclick)

crs_plot
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[27], line 1
----> 1 crs_plot = NDVI_2017.hvplot.image(x='x', y='y', width=400, clim=(-1,1), rot=45, cmap='viridis')
      3 tap_stream = hv.streams.Tap(source=crs_plot)
      5 tap_stream.add_subscriber(onclick)

NameError: name 'NDVI_2017' is not defined

Now our center variable will change each time we click on the plot above. Go ahead and click near the center of the lake on the plot above and then run the cell below to see the update. Every time you click a new spot on the plot, you can rerun the cell to confirm that the value has changed.

center
{'x': 351453.22378616617, 'y': 4284227.00873892}

Don’t worry too much about capturing the exact center of the lake, a rough estimate is sufficient.

Now we just need to define the area that we are interested in around this point. In this case we’ll use a 30 km box around the center point.

buffer = 1.5e4

xmin = center['x'] - buffer
xmax = center['x'] + buffer
ymin = center['y'] - buffer
ymax = center['y'] + buffer
bounding_box = [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin)]

Let’s just check that the bounding box captures the lake:

roi = gv.Polygons(bounding_box, crs=crs)
gv.tile_sources.EsriImagery * roi.options(alpha=0.3)

Regrid

We can use this region to define a new grid onto which we will interpolate our data.

Note, we will be regridding with simple linear interpolation which treats the space as flat. This is fine for our demonstration purposes at a relatively small spatial scale. However, despite what you may have heard, the earth is not flat, and treating it as such can give less accurate results when working with a larger spherical space. For more information about spherical resampling methods, check out the GeoViews resampling grids page, as it describes using different grid types including rectilinear, curvilinear grids and trimeshes. In particular, the conservative regridding approach using the xESMF library has become a common choice.

Let’s set the resolution of our new grid within the bounding box to roughly match the resolution of our original images.

res = 200
x = np.arange(xmin, xmax, res)
y = np.arange(ymin, ymax, res)
x.shape
(150,)

We will use xarray’s linear interpolation to calculate the values for each grid cell.

NDVI_2017_regridded = NDVI_2017.interp(x=x, y=y)
NDVI_1988_regridded = NDVI_1988.interp(x=x, y=y)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[34], line 1
----> 1 NDVI_2017_regridded = NDVI_2017.interp(x=x, y=y)
      2 NDVI_1988_regridded = NDVI_1988.interp(x=x, y=y)

NameError: name 'NDVI_2017' is not defined

Let’s compare our original data to this regridded form.

NDVI_2017_regridded_plot = NDVI_2017_regridded.hvplot.image(x='x', y='y', clim=(-1,1), title='NDVI 2017 regridded', geo=True, rot=45, cmap='viridis')
NDVI_1988_regridded_plot = NDVI_1988_regridded.hvplot.image(x='x', y='y', clim=(-1,1), title='NDVI 1988 regridded', geo=True, rot=45, cmap='viridis')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[35], line 1
----> 1 NDVI_2017_regridded_plot = NDVI_2017_regridded.hvplot.image(x='x', y='y', clim=(-1,1), title='NDVI 2017 regridded', geo=True, rot=45, cmap='viridis')
      2 NDVI_1988_regridded_plot = NDVI_1988_regridded.hvplot.image(x='x', y='y', clim=(-1,1), title='NDVI 1988 regridded', geo=True, rot=45, cmap='viridis')

NameError: name 'NDVI_2017_regridded' is not defined

Note

With hvPlot, you can render multiple plots using the + operator, and stack them into rows by defining the number of columns with .cols. When possible, hvPlot will automatically link the plots so you can zoom and pan on one image and keep them in sync.

Zoom into one of the images below to see the result of interpolating on this new common grid. The regridded plots (bottom row) will have a consistent pixelation compared to the original NDVI data (top row).

(NDVI_1988_plot + NDVI_2017_plot + NDVI_1988_regridded_plot + NDVI_2017_regridded_plot).cols(2)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[36], line 1
----> 1 (NDVI_1988_plot + NDVI_2017_plot + NDVI_1988_regridded_plot + NDVI_2017_regridded_plot).cols(2)

NameError: name 'NDVI_1988_plot' is not defined

Combining the data

Now that we have our data on the same grid we can combine our two years into one xarray object. We will treat the years as names and create an xarray.Dataset - a group of named xarray.DataArrays that share some of the same coordinates.

ds_regridded = xr.Dataset({'NDVI_1988': NDVI_1988_regridded, 'NDVI_2017': NDVI_2017_regridded})
ds_regridded
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[37], line 1
----> 1 ds_regridded = xr.Dataset({'NDVI_1988': NDVI_1988_regridded, 'NDVI_2017': NDVI_2017_regridded})
      2 ds_regridded

NameError: name 'NDVI_1988_regridded' is not defined

Visualizing output

We can now reference each year from the same object, and plot the arrays side by side:

ds_regridded.NDVI_1988.hvplot.image(x='x', y='y', geo=True, clim=(-1, 1), title='NDVI 1988', cmap='viridis')  +\
ds_regridded.NDVI_2017.hvplot.image(x='x', y='y', geo=True, clim=(-1, 1), title='NDVI 2017', cmap='viridis')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[38], line 1
----> 1 ds_regridded.NDVI_1988.hvplot.image(x='x', y='y', geo=True, clim=(-1, 1), title='NDVI 1988', cmap='viridis')  +\
      2 ds_regridded.NDVI_2017.hvplot.image(x='x', y='y', geo=True, clim=(-1, 1), title='NDVI 2017', cmap='viridis')

NameError: name 'ds_regridded' is not defined

Or we can calculate and plot the difference between the two years:

with xr.set_options(keep_attrs=True):
    diff_regridded = ds_regridded['NDVI_2017'] - ds_regridded['NDVI_1988']
diff_regridded
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[39], line 2
      1 with xr.set_options(keep_attrs=True):
----> 2     diff_regridded = ds_regridded['NDVI_2017'] - ds_regridded['NDVI_1988']
      3 diff_regridded

NameError: name 'ds_regridded' is not defined
NDVI_diff_regridded_plot = diff_regridded.hvplot.image(x='x', y='y', geo=True, rot=45, cmap='coolwarm', clim=(-1,1), title='NDVI 2017 - 1988')
NDVI_diff_regridded_plot
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[40], line 1
----> 1 NDVI_diff_regridded_plot = diff_regridded.hvplot.image(x='x', y='y', geo=True, rot=45, cmap='coolwarm', clim=(-1,1), title='NDVI 2017 - 1988')
      2 NDVI_diff_regridded_plot

NameError: name 'diff_regridded' is not defined

As the Vegetation Index will generally give a lower value where water is present, you can clearly see a large positive change along the edge of the lake indicating a reduction in the size of the lake over this time period.

Side-note: Resampling

Depending on your analysis, it may be beneficial to downsample your data. Especially early on during analysis development, running computations on full resolution data may eat into valuable time that you could otherwise use iterating, debugging, and improving your workflow. Luckily, downsampling xarray data is made pretty easy by grouping the values into bins based on the desired resolution and taking some aggregation (like the mean) on each of those bins.

res_1000 = 1e3
x_1000 = np.arange(xmin, xmax, res_1000)
y_1000 = np.arange(ymin, ymax, res_1000)

We’ll use the left edge as the label for now.

diff_res_1000 = (diff_regridded
    .groupby_bins('x', x_1000, labels=x_1000[:-1]).mean(dim='x')
    .groupby_bins('y', y_1000, labels=y_1000[:-1]).mean(dim='y')
    .rename(x_bins='x', y_bins='y')
)
diff_res_1000
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[42], line 1
----> 1 diff_res_1000 = (diff_regridded
      2     .groupby_bins('x', x_1000, labels=x_1000[:-1]).mean(dim='x')
      3     .groupby_bins('y', y_1000, labels=y_1000[:-1]).mean(dim='y')
      4     .rename(x_bins='x', y_bins='y')
      5 )
      6 diff_res_1000

NameError: name 'diff_regridded' is not defined
diff_res_1000.hvplot.image(x='x', y='y', geo=True, rot=45, cmap='coolwarm', clim=(-1,1), title='Downsampled')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[43], line 1
----> 1 diff_res_1000.hvplot.image(x='x', y='y', geo=True, rot=45, cmap='coolwarm', clim=(-1,1), title='Downsampled')

NameError: name 'diff_res_1000' is not defined

And there you go, nicely downsampled data ready for some fast computations.


Summary

With the target of preparing our data for further analysis, we’ve learned how to ensure that our data are on a consistent coordinate grid. Again, this is especially important if we are performing operations across the datasets, like taking the difference between their images. Although preprocessing can sometimes be tedious, it’s a very critical part of the workflow, and often only needs to be done once (or a small number of times) before being able to apply the processed data to a large number of different analyses.

What’s next?

Now that we know how to prepare data, it’s time to proceed to analysis, where we will explore a some simple machine learning approaches.

Resources and references

  • Authored/adapted by Demetris Roumis circa Dec, 2022

  • The Landsat 8 banner image is from NASA)