Skip to article frontmatterSkip to article content

Reverend Bayes updates our Belief in Flood Detection

How an 275 year old idea helps map the extent of floods

Image from wikipedia

Overview

This notebook explains how microwave (σ0\sigma^0) backscattering can be used to map the extent of a flood. We replicate in this exercise the work of Bauer-Marschallinger et al., 2022 on the TU Wien Bayesian-based flood mapping algorithm.

Prerequisites

ConceptsImportanceNotes
Intro to xarrayNecessary
Intro to Harmonic parametersNecessary
Documentation hvPlotHelpfulInteractive plotting
Documentation odc-stacHelpfulData access
  • Time to learn: 10 min

Imports

import datetime

import holoviews as hv
import hvplot.pandas
import hvplot.xarray
import numpy as np
import pandas as pd
import panel as pn
import pystac_client
import rioxarray  # noqa: F401
import xarray as xr
from bokeh.models import FixedTicker
from odc import stac as odc_stac
from scipy.stats import norm

pn.extension()
hv.extension("bokeh")
Loading...

Greece Flooding 2018

In this exercise we will replicate the case study of the above mentioned paper, the February 2018 flooding of the Greek region of Thessaly.

time_range = "2018-02-28T04:00:00Z/2018-02-28T05:00:00Z"
minlon, maxlon = 21.93, 22.23
minlat, maxlat = 39.47, 39.64
bounding_box = [minlon, minlat, maxlon, maxlat]

EODC STAC Catalog

The data required for TU Wien flood mapping algorithm consists of terrain corrected sigma naught backscatter data σ0\sigma^{0}, the projected local incidence angle (PLIA) values of those measurements, and the harmonic parameters (HPAR) of a model fit on the pixel’s backscatter time series. The latter two datasets will needed to calculate the probability density functions over land and water for. We will be getting the required data from the EODC STAC Catalog. Specifically the collections: SENTINEL_SIG0_20M, SENTINEL1_MPLIA and SENTINEL1_HPAR. We use the pystac-client and odc_stac packages to, respectively, discover and fetch the data.

Due to the way the data is acquired and stored, some items include “no data” areas. In our case, no data has the value -9999, but this can vary from data provider to data provider. This information can usually be found in the metadata. Furthermore, to save memory, data is often stored as integer (e.g. 25) and not in float (e.g. 2.5) format. For this reason, the backscatter values are often multiplied by a scale factor. Hence we define the function post_process_eodc_cube to correct for these factors as obtained from the STAC metadata.

Sigma naught

eodc_catalog = pystac_client.Client.open("https://stac.eodc.eu/api/v1")
search = eodc_catalog.search(
    collections="SENTINEL1_SIG0_20M",
    bbox=bounding_box,
    datetime=time_range,
)
items_sig0 = search.item_collection()


def post_process_eodc_cube(dc, items, bands):
    """
    Postprocessing of EODC data cubes.

    Parameters
    ----------
    x : xarray.Dataset
    items: pystac.item_collection.ItemCollection
        STAC items that concern the Xarray Dataset
    bands: array
        Selected bands

    Returns
    -------
    xarray.Dataset
    """
    if not isinstance(bands, tuple):
        bands = tuple([bands])
    for i in bands:
        dc[i] = post_process_eodc_cube_(dc[i], items, i)
    return dc


def post_process_eodc_cube_(dc, items, band):
    fields = items[0].assets[band].extra_fields
    scale = fields.get("raster:bands")[0]["scale"]
    nodata = fields.get("raster:bands")[0]["nodata"]
    return dc.where(dc != nodata) / scale


bands = "VV"
sig0_dc = odc_stac.load(items_sig0, bands=bands, bbox=bounding_box)
sig0_dc = (
    post_process_eodc_cube(sig0_dc, items_sig0, bands)
    .rename_vars({"VV": "sig0"})
    .dropna(dim="time", how="all")
    .median("time")
)

sig0_dc
Aborting load due to failure while reading: https://data.eodc.eu/collections/SENTINEL1_SIG0_20M/V1M1R1/EQUI7_EU020M/E054N006T3/SIG0_20180228T043843__VV_D080_E054N006T3_EU020M_V1M1R1_S1AIWGRDH_TUWIEN.tif:1
---------------------------------------------------------------------------
CPLE_HttpResponseError                    Traceback (most recent call last)
File rasterio/_base.pyx:310, in rasterio._base.DatasetBase.__init__()

File rasterio/_base.pyx:221, in rasterio._base.open_dataset()

File rasterio/_err.pyx:359, in rasterio._err.exc_wrap_pointer()

CPLE_HttpResponseError: CURL error: Failed to connect to data.eodc.eu port 443 after 134090 ms: Could not connect to server

During handling of the above exception, another exception occurred:

RasterioIOError                           Traceback (most recent call last)
Cell In[3], line 41
     37     return dc.where(dc != nodata) / scale
     40 bands = "VV"
---> 41 sig0_dc = odc_stac.load(items_sig0, bands=bands, bbox=bounding_box)
     42 sig0_dc = (
     43     post_process_eodc_cube(sig0_dc, items_sig0, bands)
     44     .rename_vars({"VV": "sig0"})
     45     .dropna(dim="time", how="all")
     46     .median("time")
     47 )
     49 sig0_dc

File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/stac/_stac_load.py:457, in load(items, bands, groupby, resampling, dtype, chunks, pool, crs, resolution, anchor, geobox, bbox, lon, lat, x, y, like, geopolygon, intersects, progress, fail_on_error, stac_cfg, patch_url, preserve_original_order, driver, **kw)
    453     return ds
    455 rdr_env = rdr.capture_env()
    456 return _with_debug_info(
--> 457     chunked_load(
    458         load_cfg,
    459         meta,
    460         _parsed,
    461         tyx_bins,
    462         gbt,
    463         tss,
    464         rdr_env,
    465         rdr,
    466         chunks=chunks,
    467         pool=pool,
    468         progress=progress,
    469         dtype=dtype,
    470     )
    471 )

File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_builder.py:624, in chunked_load(load_cfg, template, srcs, tyx_bins, gbt, tss, env, rdr, dtype, chunks, pool, progress)
    622 # pylint: disable=too-many-arguments
    623 if chunks is None:
--> 624     return direct_chunked_load(
    625         load_cfg,
    626         template,
    627         srcs,
    628         tyx_bins,
    629         gbt,
    630         tss,
    631         env,
    632         rdr,
    633         pool=pool,
    634         progress=progress,
    635     )
    636 return dask_chunked_load(
    637     load_cfg,
    638     template,
   (...)    646     dtype=dtype,
    647 )

File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_builder.py:857, in direct_chunked_load(load_cfg, template, srcs, tyx_bins, gbt, tss, env, rdr, pool, progress)
    854 if progress is not None:
    855     _work = progress(SizedIterable(_work, total_tasks))
--> 857 for _ in _work:
    858     pass
    860 rdr.finalise_load(load_state)

File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_utils.py:39, in pmap(func, inputs, pool)
     35 """
     36 Wrapper for ThreadPoolExecutor.map
     37 """
     38 if pool is None:
---> 39     yield from map(func, inputs)
     40     return
     42 if isinstance(pool, int):

File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_builder.py:832, in direct_chunked_load.<locals>._do_one(task)
    830     for t_idx, layer in enumerate(layers):
    831         loaders = [rdr.open(src, ctx) for _, src in layer]
--> 832         _ = _fill_nd_slice(
    833             loaders,
    834             task.dst_gbox,
    835             task.cfg,
    836             dst=dst_slice[t_idx],
    837             ydim=ydim,
    838         )
    839 t, y, x = task.idx_tyx
    840 return (task.band, t, y, x)

File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_builder.py:513, in _fill_nd_slice(srcs, dst_gbox, cfg, dst, ydim, selection)
    510     return dst
    512 src, *rest = srcs
--> 513 yx_roi, pix = src.read(cfg, dst_gbox, dst=dst, selection=selection)
    514 assert len(yx_roi) == 2
    515 assert pix.ndim == dst.ndim

File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_rio.py:115, in RioReader.read(self, cfg, dst_geobox, dst, selection)
    107 def read(
    108     self,
    109     cfg: RasterLoadParams,
   (...)    113     selection: Optional[ReaderSubsetSelection] = None,
    114 ) -> tuple[tuple[slice, slice], np.ndarray]:
--> 115     return rio_read(self._src, cfg, dst_geobox, dst=dst, selection=selection)

File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_rio.py:526, in rio_read(src, cfg, dst_geobox, dst, selection)
    520     if cfg.fail_on_error:
    521         log.error(
    522             "Aborting load due to failure while reading: %s:%d",
    523             src.uri,
    524             src.band,
    525         )
--> 526         raise e
    527 except rasterio.errors.RasterioError as e:
    528     if cfg.fail_on_error:

File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_rio.py:512, in rio_read(src, cfg, dst_geobox, dst, selection)
    508     return roi, out.transpose([1, 2, 0])
    510 try:
    511     return fixup_out(
--> 512         _rio_read(src, cfg, dst_geobox, prep_dst(dst), selection=selection)
    513     )
    514 except (
    515     rasterio.errors.RasterioIOError,
    516     rasterio.errors.RasterBlockError,
    517     rasterio.errors.WarpOperationError,
    518     rasterio.errors.WindowEvaluationError,
    519 ) as e:
    520     if cfg.fail_on_error:

File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_rio.py:560, in _rio_read(src, cfg, dst_geobox, dst, selection)
    549 def _rio_read(
    550     src: RasterSource,
    551     cfg: RasterLoadParams,
   (...)    556     # if resampling is `nearest` then ignore sub-pixel translation when deciding
    557     # whether we can just paste source into destination
    558     ttol = 0.9 if cfg.nearest else 0.05
--> 560     with rasterio.open(src.uri, "r", sharing=False) as rdr:
    561         assert isinstance(rdr, rasterio.DatasetReader)
    562         ovr_idx: Optional[int] = None

File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/rasterio/env.py:463, in ensure_env_with_credentials.<locals>.wrapper(*args, **kwds)
    460     session = DummySession()
    462 with env_ctor(session=session):
--> 463     return f(*args, **kwds)

File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/rasterio/__init__.py:356, in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, opener, **kwargs)
    353     path = _parse_path(raw_dataset_path)
    355 if mode == "r":
--> 356     dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
    357 elif mode == "r+":
    358     dataset = get_writer_for_path(path, driver=driver)(
    359         path, mode, driver=driver, sharing=sharing, **kwargs
    360     )

File rasterio/_base.pyx:312, in rasterio._base.DatasetBase.__init__()

RasterioIOError: CURL error: Failed to connect to data.eodc.eu port 443 after 134090 ms: Could not connect to server

Harmonic Parameters

search = eodc_catalog.search(
    collections="SENTINEL1_HPAR",
    bbox=bounding_box,
    query=["sat:relative_orbit=80"],
)

items_hpar = search.item_collection()
bands = ("C1", "C2", "C3", "M0", "S1", "S2", "S3", "STD")
hpar_dc = odc_stac.load(
    items_hpar,
    bands=bands,
    bbox=bounding_box,
    groupby=None,
)
hpar_dc = post_process_eodc_cube(hpar_dc, items_hpar, bands).median("time")
hpar_dc

Projected Local Incidence Angles

search = eodc_catalog.search(
    collections="SENTINEL1_MPLIA",
    bbox=bounding_box,
    query=["sat:relative_orbit=80"],
)

items_plia = search.item_collection()

bands = "MPLIA"
plia_dc = odc_stac.load(
    items_plia,
    bands=bands,
    bbox=bounding_box,
)

plia_dc = post_process_eodc_cube(plia_dc, items_plia, bands).median("time")
plia_dc

Finally, we merged the datasets as one big dataset and reproject the data in EPSG 4326 for easier visualizing of the data.

flood_dc = xr.merge([sig0_dc, plia_dc, hpar_dc])
flood_dc = flood_dc.rio.reproject("EPSG:4326").rio.write_crs("EPSG:4326")
flood_dc

From Backscattering to Flood Mapping

In the following lines we create a map with microwave backscattering values.

mrs_view = flood_dc.sig0.hvplot.image(
    x="x", y="y", cmap="viridis", geo=True, tiles=True
).opts(frame_height=400)
mrs_view

Figure 1:Area targeted for σ0\sigma^0 backscattering is the Greek region of Thessaly, which experienced a major flood in February of 2018.

Microwave Backscattering over Land and Water

Reverend Bayes was concerned with two events, one (the hypothesis) occurring before the other (the evidence). If we know its cause, it is easy to logically deduce the probability of an effect. However, in this case we want to deduce the probability of a cause from an observed effect, also known as “reversed probability”. In the case of flood mapping, we have σ0\sigma^0 backscatter observations over land (the effect) and we want to deduce the probability of flooding (FF) and non-flooding (NFNF).

In other words, we want to know the probability of flooding P(F)P(F) given a pixel’s σ0\sigma^0:

P(Fσ0)P(F|\sigma^0)

and the probability of a pixel being not flooded P(NF)P(NF) given a certain σ0\sigma^0:

P(NFσ0).P(NF|\sigma^0).

Bayes showed that these can be deduced from the observation that forward and reversed probability are equal, so that:

P(Fσ0)P(σ0)=P(σ0F)P(F)P(F|\sigma^0)P(\sigma^0) = P(\sigma^0|F)P(F)

and

P(NFσ0)P(σ0)=P(σ0NF)P(NF).P(NF|\sigma^0)P(\sigma^0) = P(\sigma^0|NF)P(NF).

The forward probability of σ0\sigma^0 given the occurrence of flooding (P(σ0F)P(\sigma^0|F)) and σ0\sigma^0 given no flooding (P(σ0NF)P(\sigma^0|NF)) can be extracted from past information on backscattering over land and water surfaces. As seen in the sketch below , the characteristics of backscattering over land and water differ considerably.

Schematic backscattering over land and water. Image from Geological Survey Ireland{#fig-sat}

Likelihoods

The so-called likelihoods of P(σ0F)P(\sigma^0|F) and P(σ0NF)P(\sigma^0|NF) can thus be calculated from past backscattering information. In the following code chunk we define the functions calc_water_likelihood and calc_land_likelihood to calculate the water and land likelihood’s of a pixel, based on the Xarray datasets for the PLIA and HPAR, respectively.

def calc_water_likelihood(sigma, x=None, y=None):
    """
    Calculate water likelihoods.

    Parameters
    ----------
    sigma: float|array
        Sigma naught value(s)
    x: float|array
        Longitude
    y: float|array
        Latitude

    Returns
    -------
    numpy array
    """
    point = flood_dc.sel(x=x, y=y, method="nearest")
    wbsc_mean = point.MPLIA * -0.394181 + -4.142015
    wbsc_std = 2.754041
    return norm.pdf(sigma, wbsc_mean.to_numpy(), wbsc_std)


def expected_land_backscatter(data, dtime_str):
    w = np.pi * 2 / 365
    dt = datetime.datetime.strptime(dtime_str, "%Y-%m-%d")
    t = dt.timetuple().tm_yday
    wt = w * t

    M0 = data.M0
    S1 = data.S1
    S2 = data.S2
    S3 = data.S3
    C1 = data.C1
    C2 = data.C2
    C3 = data.C3
    hm_c1 = (M0 + S1 * np.sin(wt)) + (C1 * np.cos(wt))
    hm_c2 = (hm_c1 + S2 * np.sin(2 * wt)) + C2 * np.cos(2 * wt)
    hm_c3 = (hm_c2 + S3 * np.sin(3 * wt)) + C3 * np.cos(3 * wt)
    return hm_c3


def calc_land_likelihood(sigma, x=None, y=None):
    """
    Calculate land likelihoods.

    Parameters
    ----------
    sigma: float|array
        Sigma naught value(s)
    x: float|array
        Longitude
    y: float|array
        Latitude

    Returns
    -------
    numpy array
    """
    point = flood_dc.sel(x=x, y=y, method="nearest")
    lbsc_mean = expected_land_backscatter(point, "2018-02-01")
    lbsc_std = point.STD
    return norm.pdf(sigma, lbsc_mean.to_numpy(), lbsc_std.to_numpy())

Without going into the details of how these likelihoods are calculated, you can hover over a pixel of the map to plot the likelihoods of σ0\sigma^0 being governed by land or water. For reference we model the water and land likelihoods (model_likelihoods) over a range of σ0\sigma^0 values.

def model_likelihoods(sigma=(-30, 0), x=None, y=None):
    """
    Model likelihoods over a range of sigma naught.

    Parameters
    ----------
    sigma: tuple
        Minimum and maximum for range of sigma naught values
    x: float|array
        Longitude
    y: float|array
        Latitude

    Returns
    -------
    Pandas Datafrane
    """
    sigma = np.arange(sigma[0], sigma[1], 0.1)
    land_likelihood = calc_land_likelihood(sigma=sigma, x=x, y=y)
    water_likelihood = calc_water_likelihood(sigma=sigma, x=x, y=y)
    point = flood_dc.sel(x=x, y=y, method="nearest")
    return pd.DataFrame(
        {
            "sigma": sigma,
            "water_likelihood": water_likelihood,
            "land_likelihood": land_likelihood,
            "observed": np.repeat(point.sig0.values, len(land_likelihood)),
        }
    )


pointer = hv.streams.PointerXY(source=mrs_view.get(1), x=22.1, y=39.5)

likelihood_pdi = hvplot.bind(
    model_likelihoods, x=pointer.param.x, y=pointer.param.y
).interactive()

view_likelihoods = (
    likelihood_pdi.hvplot("sigma", "water_likelihood", ylabel="likelihoods").dmap()
    * likelihood_pdi.hvplot("sigma", "land_likelihood").dmap()
    * likelihood_pdi.hvplot("observed", "land_likelihood").dmap()
).opts(frame_height=200, frame_width=300)

view_likelihoods + mrs_view.get(1)

Figure 2:Likelihoods for σ0\sigma^0 being associated with land or water for 1 pixel in the Greek area of Thessaly. Likelihoods are calculated over a range of σ0\sigma^0. The pixel’s observed σ0\sigma^0 is given with a vertical line. Hover on the map to re-calculate and update this figure for another pixel in the study area.

Posteriors

Having calculated the likelihoods, we can now move on to calculate the probability of (non-)flooding given a pixel’s σ0\sigma^0. These so-called posteriors need one more piece of information, as can be seen in the equation above. We need the probability that a pixel is flooded P(F)P(F) or not flooded P(NF)P(NF). Of course, these are the figures we’ve been trying to find this whole time. We don’t actually have them yet, so what can we do? In Bayesian statistics, we can just start with our best guess. These guesses are called our “priors”, because they are the beliefs we hold prior to looking at the data. This subjective prior belief is the foundation Bayesian statistics, and we use the likelihoods we just calculated to update our belief in this particular hypothesis. This updated belief is called the “posterior”.

Let’s say that our best estimate for the chance of flooding versus non-flooding of a pixel is 50-50: a coin flip. We now can also calculate the probability of backscattering P(σ0)P(\sigma^0), as the weighted average of the water and land likelihoods, ensuring that our posteriors range between 0 to 1.

The following code block shows how we calculate the priors.

def calc_posteriors(sigma, x=None, y=None):
    """
    Calculate posterior probability.

    Parameters
    ----------
    sigma: float|array
        Sigma naught value(s)
    x: float|array
        Longitude
    y: float|array
        Latitude

    Returns
    -------
    Tuple of two Numpy arrays
    """
    land_likelihood = calc_land_likelihood(sigma=sigma, x=x, y=y)
    water_likelihood = calc_water_likelihood(sigma=sigma, x=x, y=y)
    evidence = (water_likelihood * 0.5) + (land_likelihood * 0.5)
    water_posterior = (water_likelihood * 0.5) / evidence
    land_posterior = (land_likelihood * 0.5) / evidence
    return water_posterior, land_posterior

We can plot the posterior probabilities of flooding and non-flooding again and compare these to pixel’s measured σ0\sigma^0. For reference we model the flood and non-flood posteriors (model_posteriors) over a range of σ0\sigma^0 values. Hover on a pixel to calculate the posterior probability.

def model_posteriors(sigma=(-30, 0), x=None, y=None):
    """
    Model posterior probabilities over a range of sigma naught.

    Parameters
    ----------
    sigma: tuple
        Minimum and maximum for range of sigma naught values
    x: float|array
        Longitude
    y: float|array
        Latitude

    Returns
    -------
    Pandas Datafrane
    """
    bays_pd = model_likelihoods(sigma=sigma, x=x, y=y)
    sigma = np.arange(sigma[0], sigma[1], 0.1)
    bays_pd["f_post_prob"], bays_pd["nf_post_prob"] = calc_posteriors(
        sigma=sigma, x=x, y=y
    )
    return bays_pd


posterior_pdi = hvplot.bind(
    model_posteriors, x=pointer.param.x, y=pointer.param.y
).interactive()

view_posteriors = (
    posterior_pdi.hvplot("sigma", "f_post_prob", ylabel="posteriors").dmap()
    * posterior_pdi.hvplot("sigma", "nf_post_prob").dmap()
    * posterior_pdi.hvplot("observed", "nf_post_prob").dmap()
).opts(frame_height=200, frame_width=300)

(view_likelihoods + view_posteriors).cols(1) + mrs_view.get(1)

Figure 3:Posterior probabilities for σ0\sigma^0 of 1 pixel being associated with land for water in the Greek area of Thessaly. Hover on the map to re-calculate and update this figure for another pixel in the study area.

Flood Classification

We are now ready to combine all this information and classify the pixels according to the probability of flooding given the backscatter value of each pixel. Here we just look whether the probability of flooding is higher than non-flooding:

def bayesian_flood_decision(sigma, x=None, y=None):
    """
    Bayesian decision.

    Parameters
    ----------
    sigma: float|array
        Sigma naught value(s)
    x: float|array
        Longitude
    y: float|array
        Latitude

    Returns
    -------
    Xarray DataArray
    """
    f_post_prob, nf_post_prob = calc_posteriors(sigma=sigma, x=x, y=y)
    return xr.where(
        np.isnan(f_post_prob) | np.isnan(nf_post_prob),
        np.nan,
        np.greater(f_post_prob, nf_post_prob),
    )

Hover on a point in the below map to see the likelihoods and posterior distributions (in the left-hand subplots).

flood_dc["decision"] = (
    ("y", "x"),
    bayesian_flood_decision(flood_dc.sig0, flood_dc.x, flood_dc.y),
)

colorbar_opts = {
    "major_label_overrides": {
        0: "non-flood",
        1: "flood",
    },
    "ticker": FixedTicker(ticks=[0, 1]),
}
flood_view = flood_dc.decision.hvplot.image(
    x="x", y="y", rasterize=True, geo=True, cmap=["rgba(0, 0, 1, 0.1)", "darkred"]
).opts(frame_height=400, colorbar_opts={**colorbar_opts})
mrs_view.get(0) * flood_view

Figure 4:Flood extent of the Greek region of Thessaly based on Bayesian probabilities are shown on the map superimposed on an open street map. Hover over a pixel to generate the point’s water and land likelihoods as well as the posterior probabilities.

References
  1. Bauer-Marschallinger, B., Cao, S., Tupas, M. E., Roth, F., Navacchi, C., Melzer, T., Freeman, V., & Wagner, W. (2022). Satellite-Based Flood Mapping through Bayesian Inference from a Sentinel-1 SAR Datacube. Remote Sensing, 14(15), 3673. 10.3390/rs14153673