How an 275 year old idea helps map the extent of floods

Overview¶
This notebook explains how microwave () backscattering can be used to map the extent of a flood. We replicate in this exercise the work of Bauer-Marschallinger et al., 2022 on the TU Wien Bayesian-based flood mapping algorithm.
Prerequisites¶
| Concepts | Importance | Notes |
|---|---|---|
| Intro to xarray | Necessary | |
| Intro to Harmonic parameters | Necessary | |
| Documentation hvPlot | Helpful | Interactive plotting |
| Documentation odc-stac | Helpful | Data access |
Time to learn: 10 min
Imports¶
import datetime
import holoviews as hv
import hvplot.pandas
import hvplot.xarray
import numpy as np
import pandas as pd
import panel as pn
import pystac_client
import rioxarray # noqa: F401
import xarray as xr
from bokeh.models import FixedTicker
from odc import stac as odc_stac
from scipy.stats import norm
pn.extension()
hv.extension("bokeh")Greece Flooding 2018¶
In this exercise we will replicate the case study of the above mentioned paper, the February 2018 flooding of the Greek region of Thessaly.
time_range = "2018-02-28T04:00:00Z/2018-02-28T05:00:00Z"
minlon, maxlon = 21.93, 22.23
minlat, maxlat = 39.47, 39.64
bounding_box = [minlon, minlat, maxlon, maxlat]EODC STAC Catalog¶
The data required for TU Wien flood mapping algorithm consists of terrain corrected sigma naught backscatter data , the projected local incidence angle (PLIA) values of those measurements, and the harmonic parameters (HPAR) of a model fit on the pixel’s backscatter time series. The latter two datasets will needed to calculate the probability density functions over land and water for. We will be getting the required data from the EODC STAC Catalog. Specifically the collections: SENTINEL_SIG0_20M, SENTINEL1_MPLIA and SENTINEL1_HPAR. We use the pystac-client and odc_stac packages to, respectively, discover and fetch the data.
Due to the way the data is acquired and stored, some items include “no data” areas. In our case, no data has the value -9999, but this can vary from data provider to data provider. This information can usually be found in the metadata. Furthermore, to save memory, data is often stored as integer (e.g. 25) and not in float (e.g. 2.5) format. For this reason, the backscatter values are often multiplied by a scale factor. Hence we define the function post_process_eodc_cube to correct for these factors as obtained from the STAC metadata.
Sigma naught¶
eodc_catalog = pystac_client.Client.open("https://stac.eodc.eu/api/v1")
search = eodc_catalog.search(
collections="SENTINEL1_SIG0_20M",
bbox=bounding_box,
datetime=time_range,
)
items_sig0 = search.item_collection()
def post_process_eodc_cube(dc, items, bands):
"""
Postprocessing of EODC data cubes.
Parameters
----------
x : xarray.Dataset
items: pystac.item_collection.ItemCollection
STAC items that concern the Xarray Dataset
bands: array
Selected bands
Returns
-------
xarray.Dataset
"""
if not isinstance(bands, tuple):
bands = tuple([bands])
for i in bands:
dc[i] = post_process_eodc_cube_(dc[i], items, i)
return dc
def post_process_eodc_cube_(dc, items, band):
fields = items[0].assets[band].extra_fields
scale = fields.get("raster:bands")[0]["scale"]
nodata = fields.get("raster:bands")[0]["nodata"]
return dc.where(dc != nodata) / scale
bands = "VV"
sig0_dc = odc_stac.load(items_sig0, bands=bands, bbox=bounding_box)
sig0_dc = (
post_process_eodc_cube(sig0_dc, items_sig0, bands)
.rename_vars({"VV": "sig0"})
.dropna(dim="time", how="all")
.median("time")
)
sig0_dcHarmonic Parameters¶
search = eodc_catalog.search(
collections="SENTINEL1_HPAR",
bbox=bounding_box,
query=["sat:relative_orbit=80"],
)
items_hpar = search.item_collection()
bands = ("C1", "C2", "C3", "M0", "S1", "S2", "S3", "STD")
hpar_dc = odc_stac.load(
items_hpar,
bands=bands,
bbox=bounding_box,
groupby=None,
)
hpar_dc = post_process_eodc_cube(hpar_dc, items_hpar, bands).median("time")
hpar_dcAborting load due to failure while reading: https://data.eodc.eu/collections/SENTINEL1_HPAR/EQUI7_EU020M/E054N006T3/SIG0-HPAR-C1_20190101_20211231_VV_D080_E054N006T3_EU020M_V0M2R3_S1IWGRDH_TUWIEN.tif:1
---------------------------------------------------------------------------
CPLE_HttpResponseError Traceback (most recent call last)
File rasterio/_base.pyx:320, in rasterio._base.DatasetBase.__init__()
--> 320 'Could not get source, probably due dynamically evaluated source code.'
File rasterio/_base.pyx:232, in rasterio._base.open_dataset()
--> 232 'Could not get source, probably due dynamically evaluated source code.'
File rasterio/_err.pyx:359, in rasterio._err.exc_wrap_pointer()
--> 359 'Could not get source, probably due dynamically evaluated source code.'
CPLE_HttpResponseError: HTTP response code: 502
During handling of the above exception, another exception occurred:
RasterioIOError Traceback (most recent call last)
Cell In[4], line 9
5 )
6
7 items_hpar = search.item_collection()
8 bands = ("C1", "C2", "C3", "M0", "S1", "S2", "S3", "STD")
----> 9 hpar_dc = odc_stac.load(
10 items_hpar,
11 bands=bands,
12 bbox=bounding_box,
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/stac/_stac_load.py:509, in load(items, bands, groupby, resampling, dtype, chunks, pool, crs, resolution, anchor, geobox, bbox, lon, lat, x, y, like, geopolygon, intersects, progress, fail_on_error, stac_cfg, with_properties, patch_url, preserve_original_order, driver, fuse_func, **kw)
505 return ds
507 rdr_env = rdr.capture_env()
508 return _with_debug_info(
--> 509 chunked_load(
510 load_cfg,
511 meta,
512 srcs,
513 tyx_bins,
514 gbt,
515 tss,
516 rdr_env,
517 rdr,
518 chunks=chunks,
519 pool=pool,
520 progress=progress,
521 dtype=dtype,
522 )
523 )
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_builder.py:695, in chunked_load(load_cfg, template, srcs, tyx_bins, gbt, tss, env, rdr, dtype, chunks, pool, progress)
693 # pylint: disable=too-many-arguments
694 if chunks is None:
--> 695 return direct_chunked_load(
696 load_cfg,
697 template,
698 srcs,
699 tyx_bins,
700 gbt,
701 tss,
702 env,
703 rdr,
704 pool=pool,
705 progress=progress,
706 )
707 return dask_chunked_load(
708 load_cfg,
709 template,
(...) 717 dtype=dtype,
718 )
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_builder.py:958, in direct_chunked_load(load_cfg, template, srcs, tyx_bins, gbt, tss, env, rdr, pool, progress)
955 if progress is not None:
956 _work = progress(SizedIterable(_work, total_tasks))
--> 958 for _ in _work:
959 pass
961 if aux_cfg:
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_utils.py:43, in pmap(func, inputs, pool)
39 """
40 Wrapper for ThreadPoolExecutor.map
41 """
42 if pool is None:
---> 43 yield from map(func, inputs)
44 return
46 if isinstance(pool, int):
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_builder.py:933, in direct_chunked_load.<locals>._do_one(task)
931 for t_idx, layer in enumerate(layers):
932 loaders = [rdr.open(src, ctx) for _, src in layer]
--> 933 _ = _fill_nd_slice(
934 loaders,
935 task.dst_gbox,
936 task.cfg,
937 dst=dst_slice[t_idx],
938 ydim=ydim,
939 )
940 t, y, x = task.idx_tyx
941 return (task.band, t, y, x)
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_builder.py:583, in _fill_nd_slice(srcs, dst_gbox, cfg, dst, ydim, selection)
579 return dst
581 fuser = resolve_fuser(cfg.fuser_fqn) if cfg.fuser_fqn is not None else None
--> 583 return fuse_nd_slices(
584 (src.read(cfg, dst_gbox, selection=selection) for src in srcs),
585 fill_value,
586 dst,
587 ydim=ydim,
588 prefilled=True,
589 fuser=fuser,
590 )
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_builder.py:549, in fuse_nd_slices(srcs, fill_value, dst, ydim, prefilled, fuser)
546 if fuser is None:
547 fuser = fuser_for_nodata(fill_value)
--> 549 for yx_roi, pix in srcs:
550 assert len(yx_roi) == 2
551 assert pix.ndim == dst.ndim
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_builder.py:584, in <genexpr>(.0)
579 return dst
581 fuser = resolve_fuser(cfg.fuser_fqn) if cfg.fuser_fqn is not None else None
583 return fuse_nd_slices(
--> 584 (src.read(cfg, dst_gbox, selection=selection) for src in srcs),
585 fill_value,
586 dst,
587 ydim=ydim,
588 prefilled=True,
589 fuser=fuser,
590 )
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_rio.py:140, in RioReader.read(self, cfg, dst_geobox, dst, selection)
132 def read(
133 self,
134 cfg: RasterLoadParams,
(...) 138 selection: Optional[ReaderSubsetSelection] = None,
139 ) -> tuple[tuple[slice, slice], np.ndarray]:
--> 140 return rio_read(self._src, cfg, dst_geobox, dst=dst, selection=selection)
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_rio.py:567, in rio_read(src, cfg, dst_geobox, dst, selection)
561 if cfg.fail_on_error:
562 log.error(
563 "Aborting load due to failure while reading: %s:%d",
564 src.uri,
565 src.band,
566 )
--> 567 raise e
568 except rasterio.errors.RasterioError as e:
569 if cfg.fail_on_error:
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_rio.py:553, in rio_read(src, cfg, dst_geobox, dst, selection)
549 return roi, out.transpose([1, 2, 0])
551 try:
552 return fixup_out(
--> 553 _rio_read(src, cfg, dst_geobox, prep_dst(dst), selection=selection)
554 )
555 except (
556 rasterio.errors.RasterioIOError,
557 rasterio.errors.RasterBlockError,
558 rasterio.errors.WarpOperationError,
559 rasterio.errors.WindowEvaluationError,
560 ) as e:
561 if cfg.fail_on_error:
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/odc/loader/_rio.py:601, in _rio_read(src, cfg, dst_geobox, dst, selection)
590 def _rio_read(
591 src: RasterSource,
592 cfg: RasterLoadParams,
(...) 597 # if resampling is `nearest` then ignore sub-pixel translation when deciding
598 # whether we can just paste source into destination
599 ttol = 0.9 if cfg.nearest else 0.05
--> 601 with rasterio.open(src.uri, "r", sharing=False) as rdr:
602 assert isinstance(rdr, rasterio.DatasetReader)
603 ovr_idx: Optional[int] = None
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/rasterio/env.py:464, in ensure_env_with_credentials.<locals>.wrapper(*args, **kwds)
461 session = DummySession()
463 with env_ctor(session=session):
--> 464 return f(*args, **kwds)
File ~/micromamba/envs/eo-datascience-cookbook/lib/python3.13/site-packages/rasterio/__init__.py:367, in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, thread_safe, opener, **kwargs)
364 path = _parse_path(raw_dataset_path)
366 if mode == "r":
--> 367 dataset = DatasetReader(path, driver=driver, sharing=sharing, thread_safe=thread_safe, **kwargs)
368 elif mode == "r+":
369 dataset = get_writer_for_path(path, driver=driver)(
370 path, mode, driver=driver, sharing=sharing, **kwargs
371 )
File rasterio/_base.pyx:329, in rasterio._base.DatasetBase.__init__()
--> 329 'Could not get source, probably due dynamically evaluated source code.'
RasterioIOError: HTTP response code: 502Projected Local Incidence Angles¶
search = eodc_catalog.search(
collections="SENTINEL1_MPLIA",
bbox=bounding_box,
query=["sat:relative_orbit=80"],
)
items_plia = search.item_collection()
bands = "MPLIA"
plia_dc = odc_stac.load(
items_plia,
bands=bands,
bbox=bounding_box,
)
plia_dc = post_process_eodc_cube(plia_dc, items_plia, bands).median("time")
plia_dcFinally, we merged the datasets as one big dataset and reproject the data in EPSG 4326 for easier visualizing of the data.
flood_dc = xr.merge([sig0_dc, plia_dc, hpar_dc])
flood_dc = flood_dc.rio.reproject("EPSG:4326").rio.write_crs("EPSG:4326")
flood_dcFrom Backscattering to Flood Mapping¶
In the following lines we create a map with microwave backscattering values.
mrs_view = flood_dc.sig0.hvplot.image(
x="x", y="y", cmap="viridis", geo=True, tiles=True
).opts(frame_height=400)
mrs_viewFigure 1:Area targeted for backscattering is the Greek region of Thessaly, which experienced a major flood in February of 2018.
Microwave Backscattering over Land and Water¶
Reverend Bayes was concerned with two events, one (the hypothesis) occurring before the other (the evidence). If we know its cause, it is easy to logically deduce the probability of an effect. However, in this case we want to deduce the probability of a cause from an observed effect, also known as “reversed probability”. In the case of flood mapping, we have backscatter observations over land (the effect) and we want to deduce the probability of flooding () and non-flooding ().
In other words, we want to know the probability of flooding given a pixel’s :
and the probability of a pixel being not flooded given a certain :
Bayes showed that these can be deduced from the observation that forward and reversed probability are equal, so that:
and
The forward probability of given the occurrence of flooding () and given no flooding () can be extracted from past information on backscattering over land and water surfaces. As seen in the sketch below , the characteristics of backscattering over land and water differ considerably.
{#fig-sat}
Likelihoods¶
The so-called likelihoods of and can thus be calculated from past backscattering information. In the following code chunk we define the functions calc_water_likelihood and calc_land_likelihood to calculate the water and land likelihood’s of a pixel, based on the Xarray datasets for the PLIA and HPAR, respectively.
def calc_water_likelihood(sigma, x=None, y=None):
"""
Calculate water likelihoods.
Parameters
----------
sigma: float|array
Sigma naught value(s)
x: float|array
Longitude
y: float|array
Latitude
Returns
-------
numpy array
"""
point = flood_dc.sel(x=x, y=y, method="nearest")
wbsc_mean = point.MPLIA * -0.394181 + -4.142015
wbsc_std = 2.754041
return norm.pdf(sigma, wbsc_mean.to_numpy(), wbsc_std)
def expected_land_backscatter(data, dtime_str):
w = np.pi * 2 / 365
dt = datetime.datetime.strptime(dtime_str, "%Y-%m-%d")
t = dt.timetuple().tm_yday
wt = w * t
M0 = data.M0
S1 = data.S1
S2 = data.S2
S3 = data.S3
C1 = data.C1
C2 = data.C2
C3 = data.C3
hm_c1 = (M0 + S1 * np.sin(wt)) + (C1 * np.cos(wt))
hm_c2 = (hm_c1 + S2 * np.sin(2 * wt)) + C2 * np.cos(2 * wt)
hm_c3 = (hm_c2 + S3 * np.sin(3 * wt)) + C3 * np.cos(3 * wt)
return hm_c3
def calc_land_likelihood(sigma, x=None, y=None):
"""
Calculate land likelihoods.
Parameters
----------
sigma: float|array
Sigma naught value(s)
x: float|array
Longitude
y: float|array
Latitude
Returns
-------
numpy array
"""
point = flood_dc.sel(x=x, y=y, method="nearest")
lbsc_mean = expected_land_backscatter(point, "2018-02-01")
lbsc_std = point.STD
return norm.pdf(sigma, lbsc_mean.to_numpy(), lbsc_std.to_numpy())Without going into the details of how these likelihoods are calculated, you can hover over a pixel of the map to plot the likelihoods of being governed by land or water. For reference we model the water and land likelihoods (model_likelihoods) over a range of values.
def model_likelihoods(sigma=(-30, 0), x=None, y=None):
"""
Model likelihoods over a range of sigma naught.
Parameters
----------
sigma: tuple
Minimum and maximum for range of sigma naught values
x: float|array
Longitude
y: float|array
Latitude
Returns
-------
Pandas Datafrane
"""
sigma = np.arange(sigma[0], sigma[1], 0.1)
land_likelihood = calc_land_likelihood(sigma=sigma, x=x, y=y)
water_likelihood = calc_water_likelihood(sigma=sigma, x=x, y=y)
point = flood_dc.sel(x=x, y=y, method="nearest")
return pd.DataFrame(
{
"sigma": sigma,
"water_likelihood": water_likelihood,
"land_likelihood": land_likelihood,
"observed": np.repeat(point.sig0.values, len(land_likelihood)),
}
)
pointer = hv.streams.PointerXY(source=mrs_view.get(1), x=22.1, y=39.5)
likelihood_pdi = hvplot.bind(
model_likelihoods, x=pointer.param.x, y=pointer.param.y
).interactive()
view_likelihoods = (
likelihood_pdi.hvplot("sigma", "water_likelihood", ylabel="likelihoods").dmap()
* likelihood_pdi.hvplot("sigma", "land_likelihood").dmap()
* likelihood_pdi.hvplot("observed", "land_likelihood").dmap()
).opts(frame_height=200, frame_width=300)
view_likelihoods + mrs_view.get(1)Figure 2:Likelihoods for being associated with land or water for 1 pixel in the Greek area of Thessaly. Likelihoods are calculated over a range of . The pixel’s observed is given with a vertical line. Hover on the map to re-calculate and update this figure for another pixel in the study area.
Posteriors¶
Having calculated the likelihoods, we can now move on to calculate the probability of (non-)flooding given a pixel’s . These so-called posteriors need one more piece of information, as can be seen in the equation above. We need the probability that a pixel is flooded or not flooded . Of course, these are the figures we’ve been trying to find this whole time. We don’t actually have them yet, so what can we do? In Bayesian statistics, we can just start with our best guess. These guesses are called our “priors”, because they are the beliefs we hold prior to looking at the data. This subjective prior belief is the foundation Bayesian statistics, and we use the likelihoods we just calculated to update our belief in this particular hypothesis. This updated belief is called the “posterior”.
Let’s say that our best estimate for the chance of flooding versus non-flooding of a pixel is 50-50: a coin flip. We now can also calculate the probability of backscattering , as the weighted average of the water and land likelihoods, ensuring that our posteriors range between 0 to 1.
The following code block shows how we calculate the priors.
def calc_posteriors(sigma, x=None, y=None):
"""
Calculate posterior probability.
Parameters
----------
sigma: float|array
Sigma naught value(s)
x: float|array
Longitude
y: float|array
Latitude
Returns
-------
Tuple of two Numpy arrays
"""
land_likelihood = calc_land_likelihood(sigma=sigma, x=x, y=y)
water_likelihood = calc_water_likelihood(sigma=sigma, x=x, y=y)
evidence = (water_likelihood * 0.5) + (land_likelihood * 0.5)
water_posterior = (water_likelihood * 0.5) / evidence
land_posterior = (land_likelihood * 0.5) / evidence
return water_posterior, land_posteriorWe can plot the posterior probabilities of flooding and non-flooding again and compare these to pixel’s measured . For reference we model the flood and non-flood posteriors (model_posteriors) over a range of values. Hover on a pixel to calculate the posterior probability.
def model_posteriors(sigma=(-30, 0), x=None, y=None):
"""
Model posterior probabilities over a range of sigma naught.
Parameters
----------
sigma: tuple
Minimum and maximum for range of sigma naught values
x: float|array
Longitude
y: float|array
Latitude
Returns
-------
Pandas Datafrane
"""
bays_pd = model_likelihoods(sigma=sigma, x=x, y=y)
sigma = np.arange(sigma[0], sigma[1], 0.1)
bays_pd["f_post_prob"], bays_pd["nf_post_prob"] = calc_posteriors(
sigma=sigma, x=x, y=y
)
return bays_pd
posterior_pdi = hvplot.bind(
model_posteriors, x=pointer.param.x, y=pointer.param.y
).interactive()
view_posteriors = (
posterior_pdi.hvplot("sigma", "f_post_prob", ylabel="posteriors").dmap()
* posterior_pdi.hvplot("sigma", "nf_post_prob").dmap()
* posterior_pdi.hvplot("observed", "nf_post_prob").dmap()
).opts(frame_height=200, frame_width=300)
(view_likelihoods + view_posteriors).cols(1) + mrs_view.get(1)Figure 3:Posterior probabilities for of 1 pixel being associated with land for water in the Greek area of Thessaly. Hover on the map to re-calculate and update this figure for another pixel in the study area.
Flood Classification¶
We are now ready to combine all this information and classify the pixels according to the probability of flooding given the backscatter value of each pixel. Here we just look whether the probability of flooding is higher than non-flooding:
def bayesian_flood_decision(sigma, x=None, y=None):
"""
Bayesian decision.
Parameters
----------
sigma: float|array
Sigma naught value(s)
x: float|array
Longitude
y: float|array
Latitude
Returns
-------
Xarray DataArray
"""
f_post_prob, nf_post_prob = calc_posteriors(sigma=sigma, x=x, y=y)
return xr.where(
np.isnan(f_post_prob) | np.isnan(nf_post_prob),
np.nan,
np.greater(f_post_prob, nf_post_prob),
)Hover on a point in the below map to see the likelihoods and posterior distributions (in the left-hand subplots).
flood_dc["decision"] = (
("y", "x"),
bayesian_flood_decision(flood_dc.sig0, flood_dc.x, flood_dc.y),
)
colorbar_opts = {
"major_label_overrides": {
0: "non-flood",
1: "flood",
},
"ticker": FixedTicker(ticks=[0, 1]),
}
flood_view = flood_dc.decision.hvplot.image(
x="x", y="y", rasterize=True, geo=True, cmap=["rgba(0, 0, 1, 0.1)", "darkred"]
).opts(frame_height=400, colorbar_opts={**colorbar_opts})
mrs_view.get(0) * flood_viewFigure 4:Flood extent of the Greek region of Thessaly based on Bayesian probabilities are shown on the map superimposed on an open street map. Hover over a pixel to generate the point’s water and land likelihoods as well as the posterior probabilities.
- Bauer-Marschallinger, B., Cao, S., Tupas, M. E., Roth, F., Navacchi, C., Melzer, T., Freeman, V., & Wagner, W. (2022). Satellite-Based Flood Mapping through Bayesian Inference from a Sentinel-1 SAR Datacube. Remote Sensing, 14(15), 3673. 10.3390/rs14153673