π€ Why Calibrate?ΒΆ
Calibration of a physical model is the process of tuning its parameters to ensure its outputs accurately match real-world measurements. Itβs about bridging the gap between theoretical calculations and physical reality.
π What Do You Need?ΒΆ
To calibrate a model, you need four key components:
A set of mathematical equations or simulation software.
The specific "knobs" in your model that you can adjust.
High-quality measurements from the real-world system.
A metric that quantifies the error (e.g., RMSE).
βοΈ How to Calibrate?ΒΆ
There are two main approaches to calibration: by hand (trial and error) or by using a smart computer search (optimization algorithms). Optimization is highly recommended for its efficiency and accuracy.
Click to see details on different optimization methods
Gradient-Based Methods: Use the error gradient for efficient searching (e.g., Levenberg-Marquardt).
Gradient-Free Methods: Do not require gradients, essential for βblack-boxβ simulations (e.g., Nelder-Mead).
Bayesian Calibration: Treats parameters as probability distributions to quantify uncertainty.
Cross-Calibration: Calibrates one model against a trusted reference model to ensure consistency.
β How Do You Know Itβs Calibrated?ΒΆ
A successful calibration can be verified by analyzing the errors and by validating the model against new data. A common way to visualize the result is by plotting the modelβs output against the experimental data.
Interactive Example: Adjusting Model ParametersΒΆ
To make this concept interactive, you can run the code cell below. It will create the same plot but with sliders that let you manually adjust the modelβs parameters (slope
and intercept
). Try to move the sliders to make the red line fit the blue dots. This gives you a hands-on feel for the βBy Handβ calibration process.
# First, let's install ipywidgets for the interactive sliders
#%pip install -q ipywidgets
#%pip install -q scikit-image
#%pip install seaborn
# %pip install plotly
%pip install xarray==2024.05.0
import earthaccess
Collecting xarray==2024.05.0
Downloading xarray-2024.5.0-py3-none-any.whl.metadata (11 kB)
Requirement already satisfied: numpy>=1.23 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from xarray==2024.05.0) (2.2.6)
Requirement already satisfied: packaging>=23.1 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from xarray==2024.05.0) (25.0)
Requirement already satisfied: pandas>=2.0 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from xarray==2024.05.0) (2.3.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from pandas>=2.0->xarray==2024.05.0) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from pandas>=2.0->xarray==2024.05.0) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from pandas>=2.0->xarray==2024.05.0) (2025.2)
Requirement already satisfied: six>=1.5 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas>=2.0->xarray==2024.05.0) (1.17.0)
Downloading xarray-2024.5.0-py3-none-any.whl (1.2 MB)
ββββββββββββββββββββββββββββββββββββββββ 0.0/1.2 MB ? eta -:--:--
ββββββββββββββββββββββββββββββββββββββββ 1.2/1.2 MB 66.2 MB/s 0:00:00
Installing collected packages: xarray
Attempting uninstall: xarray
Found existing installation: xarray 2025.7.1
Uninstalling xarray-2025.7.1:
Successfully uninstalled xarray-2025.7.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
virtualizarr 1.2.0 requires xarray>=2024.10.0, but you have xarray 2024.5.0 which is incompatible.
rioxarray 0.19.0 requires xarray>=2024.7.0, but you have xarray 2024.5.0 which is incompatible.
Successfully installed xarray-2024.5.0
Note: you may need to restart the kernel to use updated packages.
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact, fixed
# 1. Create the sample "Experimental Data"
np.random.seed(0)
x_data = np.linspace(0, 10, 20)
true_slope = 2.5
true_intercept = 1.5
y_data = true_slope * x_data + true_intercept + np.random.normal(0, 2, size=x_data.shape)
# 2. Define a function to plot the data and our adjustable model
def plot_model(x, y, slope, intercept):
"""Plots the experimental data against the model line defined by slope and intercept."""
y_model = slope * x + intercept
plt.figure(figsize=(8, 6))
plt.plot(x, y, 'o', label='Experimental Data', markersize=8, color='royalblue')
plt.plot(x, y_model, '-', label='Adjustable Model', linewidth=3, color='red')
# Calculate and display the RMSE as our objective function score
rmse = np.sqrt(np.mean((y - y_model)**2))
plt.title(f'Comparison of Model vs. Data (RMSE: {rmse:.2f})', fontsize=16)
plt.xlabel('Independent Variable', fontsize=12)
plt.ylabel('Dependent Variable', fontsize=12)
plt.ylim(min(y_data)-2, max(y_data)+2)
plt.legend()
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()
# 3. Create the interactive plot!
# The 'interact' function automatically creates sliders for the numerical arguments.
interact(plot_model, x=fixed(x_data), y=fixed(y_data), slope=(0.0, 5.0, 0.1), intercept=(-5.0, 5.0, 0.1));
Example: Landsat calibration using MODIS SSTΒΆ
Create matchups between Landsat and MODIS SST data near Cosgrove, West Antarctica to produce a calibration for Landsat SSTs
# Import libraries and modules
%config InlineBackend.figure_format = 'svg'
%matplotlib widget
%load_ext autoreload
%autoreload 2
import pandas as pd
import xarray as xr
import geopandas as gpd
from datetime import date, timedelta, datetime
import numpy as np
import matplotlib.pylab as plt
from matplotlib import colors
from matplotlib.pylab import rcParams
from matplotlib.patches import Polygon as Pgon
import cartopy.crs as ccrs
import cartopy
import os
from cycler import cycler
import pyproj
from pyproj import Proj, transform
from sklearn.neighbors import BallTree
import pytz
import pygmt
import gc
import copy
import random
import statsmodels.formula.api as sm
import scipy.stats as stats
from sklearn.linear_model import RANSACRegressor
from sklearn.linear_model import LinearRegression
from shapely.geometry import Point, Polygon
from pathlib import Path
import math
from scipy.odr import Model, RealData, ODR
from tqdm.notebook import trange, tqdm
import seaborn as sns
import earthaccess
# For LST file masking
import pystac_client
import intake
from rasterio.session import AWSSession
import boto3
import SSTutils as stu
import warnings
warnings.filterwarnings('ignore')
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[3], line 49
46 from rasterio.session import AWSSession
47 import boto3
---> 49 import SSTutils as stu
51 import warnings
52 warnings.filterwarnings('ignore')
ModuleNotFoundError: No module named 'SSTutils'
#For color cycling in plots that is color blind friendly...make new ones at "I want hue" tools.medialab.sciences-po.fr/iwanthue
color_cycler = cycler(color=["#6777cf","#adba49","#c65ca0","#5fa042","#683287","#72ce7b","#c44a48","#45c7a9","#933c1d","#d0803f","#ac9239","#317c39"])
colorline_cycler = (cycler(color=["#75a141","#6c61b9","#bc4d45","#c1913d","#b85298","#4aa8e8"]) +
cycler(linestyle=['-','--',':','-.','-','--']))
rcParams['axes.prop_cycle'] = cycler('color', color_cycler)
Build Landsat - MODIS SST matchupsΒΆ
# Set paths and important variables and Calibration region bounding box
basepath = Path('/home/jovyan/landsatproduct-cookbook')
spacing = [990,-990] # 990m sampling of MODIS data so that upsampling is easy and because 30m takes far too long
param = 'sea_surface_temperature'
size_threshold = 30
location = 0 # 0 and 1 are the Cosgrove and Dotson Polynya calibration areas, respectively
surf_temp = 'SST' # 'SST' and 'LST' are for the Landsat SST and LST algorithms respectively
# Set location bounds
if location==1:
pathdir = 'DotsonPolynya'
latboundsC = [ -73.9 , -73.5 ] # Dotson polynya
lonboundsC = [ -113 , -111.5 ]
dfloc = 'Dotson'
elif location==0:
pathdir = 'Cosgrove'
latboundsC = [ -73.5 , -73.42 ] # near Cosgrove
lonboundsC = [ -103.0 , -102.0 ]
dfloc = 'Cosgrove'
elif location==2:
pathdir = 'Burke'
latboundsC = [ -73.81 , -73.42 ] # south of Burke
lonboundsC = [ -104.2 , -103.8 ]
dfloc = 'Burke'
if location==3:
pathdir = 'DotsonIntercomp'
latboundsC = [ -74.2 , -74.11 ] # Dotson plume for intercomparison
lonboundsC = [ -113.5 , -113.17 ]
dfloc = 'DotsonIntercomp'
# Coefficients for calibration
# SST
sstcalib_m = 0.76
sstcalib_b = 0.55
# LST
lstcalib_m = 0.80
lstcalib_b = 1.00
modmin = -1.9
LSTmin = np.around(modmin/lstcalib_m - lstcalib_b,2)
SSTmin = np.around(modmin/sstcalib_m - sstcalib_b,2) # should be about -2.0
# Uncertainties used in ODR propagation of error
modis_uncertainty = 0.44 # long wave sst ocean color atbd
sst_uncertainty = 0.3 # USGS Landsat stray light notice
lst_uncertainty = 1.0 # gerace 2020
pix_uncertainty = np.sqrt((1000*1000)/(100*100)) # MODIS 1km x 1km and Landsat 100m x 100m
# For calibrated SST runs
if surf_temp=='SST':
if location==3:
sstpath = basepath / f'Data/SST/Validation/{pathdir}/'
else:
sstpath = basepath / f'Data/SST/MODcalib/{pathdir}/'
tif = 'tif'
thresh = SSTmin
calib_m = sstcalib_m
calib_b = sstcalib_b
# If running for LST comparisons
elif surf_temp=='LST':
if location==3:
sstpath = basepath / f'Data/SST/LST/Calibration/DotsonPolynya/'
else:
sstpath = basepath / f'Data/SST/LST/Calibration/{pathdir}/'
tif = 'TIF'
thresh = LSTmin
calib_m = lstcalib_m
calib_b = lstcalib_b
# Authenticate for accessing NASA data (MODIS)
auth = earthaccess.login(strategy="interactive")
# If we are not authenticated
if not auth.authenticated:
# ask for credentials and persist them in a .netrc file
auth.login(strategy="interactive", persist=True)
# Convert bounding box to south polar stereo for checking if landsat has any data in bounding box
# Speeds up process a lot
source_crs = 'epsg:4326'
target_crs = 'epsg:3031' # Coordinate system of the file
bbox,checkbox = stu.lsat_reproj(source_crs,target_crs,(lonboundsC[0],latboundsC[0],lonboundsC[1],latboundsC[1]))
# Create polygon for later cropping
polygon = Polygon([(bbox[0][0],bbox[0][1]),(bbox[3][0],bbox[3][1]),(bbox[2][0],bbox[2][1]),(bbox[1][0],bbox[1][1])])
# Create min/max boundaries for trimming image before crop_xarray to cut down on processing times
minx, miny, maxx, maxy = polygon.bounds
polarx = [minx, maxx]
polary = [miny, maxy]
# Get Landsat file paths in directory
lsatfiles = os.listdir(sstpath)
lsatfiles = [x for x in lsatfiles if x[-3:] == tif]
lsatfiles.sort()
print (len(lsatfiles))
os.chdir(sstpath)
SSTfails = [
'LC08_L1GT_006112_20221024_20221107_02_T2_152257_Cel.tif',
'LC08_L1GT_006112_20221024_20221107_02_T2_152257_Cel.tif',
]
LSTfails = [
'LC08_L1GT_007112_20211215_20211223_02_T2_SW_LST.TIF',
'LC08_L1GT_007112_20220201_20220211_02_T2_SW_LST.TIF',
'LC08_L1GT_007112_20221202_20221212_02_T2_SW_LST.TIF',
'LC08_L1GT_007112_20230204_20230209_02_T2_SW_LST.TIF',
'LC08_L1GT_008113_20221022_20221101_02_T2_SW_LST.TIF',
'LC08_L1GT_010112_20221020_20221101_02_T2_SW_LST.TIF',
'LC08_L1GT_010113_20221020_20221101_02_T2_SW_LST.TIF',
'LC08_L1GT_012112_20221018_20221031_02_T2_SW_LST.TIF',
'LC08_L1GT_008113_20221022_20221101_02_T2_SW_LST.TIF',
'LC08_L1GT_010112_20221020_20221101_02_T2_SW_LST.TIF',
'LC08_L1GT_010113_20220105_20220114_02_T2_SW_LST.TIF',
'LC08_L1GT_010113_20220121_20220128_02_T2_SW_LST.TIF',
'LC08_L1GT_010113_20221020_20221101_02_T2_SW_LST.TIF',
'LC08_L1GT_010113_20221105_20221115_02_T2_SW_LST.TIF',
'LC08_L1GT_010113_20230108_20230124_02_T2_SW_LST.TIF',
'LC08_L1GT_010113_20230313_20230321_02_T2_SW_LST.TIF',
'LC08_L1GT_012112_20221018_20221031_02_T2_SW_LST.TIF'
]
Search for desired Landsat scenesΒΆ
# Authenticate for boto S3 access, etc.
os.environ["AWS_REQUEST_PAYER"] = "requester"
aws_session = AWSSession(boto3.Session(), requester_pays=True)
# Setup and authenticate
from dask.distributed import Client
import logging
client = Client(processes=True, n_workers=4,
threads_per_worker=1,
silence_logs=logging.ERROR)
client.run(lambda: os.environ["AWS_REQUEST_PAYER"] == "requester" )
client
# Define the landsat STAC catalog location
url = 'https://landsatlook.usgs.gov/stac-server'
collection = 'landsat-c2l1' # Landsat Collection 2, Level 1 - includes L8 and L9
i=0
ls_scene = xr.open_dataset(lsatfiles[i],chunks=dict(x=512, y=512),engine='rasterio')['band_data'].sel(band=1)
ls_scene
%%time
# ~1 min 32 sec per image
# If number of MODIS images per satellite is much more than 25, its because there is a ULY,LRY issue
os.chdir('/home/jovyan/landsatproduct-cookbook/Data/SST/MODcalib/Cosgrove/')
lsat_mod = []
#for i in tqdm(range(len(lsatfiles)), desc="Processing"):
for i in tqdm(range(1), desc="Processing the first image"):
# Check for known repeatedly bad files that will kill the code
if surf_temp == 'SST':
if lsatfiles[i] in SSTfails:
continue
elif surf_temp == 'LST':
if lsatfiles[i] in LSTfails:
continue
# Concatenate all landsat files into xarray with a time dimension
ls_scene = xr.open_dataset(lsatfiles[i],chunks=dict(x=512, y=512),engine='rasterio')['band_data'].sel(band=1)
ls_scene = ls_scene.rio.write_crs("epsg:3031", inplace=True)
print("1")
if surf_temp == 'SST':
times = pd.to_datetime(lsatfiles[i][17:25]+lsatfiles[i][41:47], format='%Y%m%d%H%M%S')
ls_scene = ls_scene.assign_coords(time=times,ID=lsatfiles[i][:-8])
elif surf_temp == 'LST':
# Need to mask LST because not done previously
mask = stu.get_lst_mask(lsatfiles[i])
ls_scene = ls_scene * mask
times = pd.to_datetime(lsatfiles[i][17:25]+'120000', format='%Y%m%d%H%M%S')
ls_scene = ls_scene.assign_coords(time=times,ID=lsatfiles[i][:-4])
ls_scene = ls_scene - 273.15
# Subset scene and check that it has the correct dimensions because y order changes sometimes
ls_scene = stu.subset_img(ls_scene,polarx,polary) # subset so easier to work with
ls_scene = stu.crop_xarray_dataarray_with_polygon(ls_scene, polygon) # crop data to exact bounding box
# if location==3:
# # Calibrate
# ls_scene = ls_scene * calib_m + calib_b
# # Remove SSTs that are unrealistically cool
# ls_scene = ls_scene.where(ls_scene >= thresh, np.nan)
lsID = lsatfiles[i]
print (lsID)
# Take mean temp, will skip the modis stage if no Landsat data in the calibration region
try:
lsat = np.nanmean(ls_scene)
ls_num = ls_scene.notnull().values.sum()
except Exception as e:
print (lsID, e)
lsat = np.nan
print(lsat)
if ~np.isfinite(lsat):
continue
# Find coincident MODIS SST scene
mod_scene, mod_file,time_dif = stu.find_MODIS(lonboundsC,latboundsC,ls_scene)
print(mod_file)
try:
# Acquire and align MODIS data to Landsat
# To subset to only high quality MODIS temp measurements which doesn't seem to be useful
# print(mod_scene.quality_level.max().values)
# mod_temps = mod_scene.sea_surface_temperature.where(mod_scene.quality_level>=4)
MODsst_xr = stu.get_sst(ls_scene,mod_scene.sea_surface_temperature,spacing,param) #mod_scene.sea_surface_temperature
# Remove SSTs that are unrealistically cool
MODsst_xr = MODsst_xr.where(MODsst_xr >= -1.9, np.nan)
# Crop Landsat image to meet the slightly smaller MODIS image (smaller image results from upsample methods in get_wv2)
ls_scene = stu.subset_img(ls_scene,[MODsst_xr.x.min(),MODsst_xr.x.max()],[MODsst_xr.y.min(),MODsst_xr.y.max()])
# Only use MODIS data where cropped Landsat data is also available
MODsst_xr_sub = MODsst_xr.where(ls_scene.notnull(),np.nan)
# Take mean temp
modis = np.nanmean(MODsst_xr_sub)
MOD_num = MODsst_xr_sub.notnull().values.sum()
except Exception as e:
print (mod_file, e)
modis = np.nan
MOD_num = 0
# Take mean using Landsat data only where cropped MODIS data is also available (need to do both)
try:
ls_scene_sub = ls_scene.where(MODsst_xr_sub.notnull(),np.nan)
lsat = np.nanmean(ls_scene_sub)
ls_num = ls_scene_sub.notnull().values.sum()
except Exception as e:
print (lsID, e)
lsat = np.nan
# Append file names with SST means from the Cosgrove box
lsat_mod.append([times,mod_file,modis,MOD_num,lsID,lsat,ls_num,time_dif])
print (f'MODIS mean: {modis}, Landsat 8: {lsat}')
try:
del ls_scene, ls_sub, mod_scene, MODsst_xr, MODsst_xr_sub
except:
pass
gc.collect()
# # Put data into DataFrame and save
# headers = ['DateTime','MODIS_filename','MODIS_SST','MODIS_pix','L8_filename',f'L8_{surf_temp}','L8_pix','time_dif']
# lsat_mod_df = pd.DataFrame(lsat_mod,columns=headers)
# out_df = basepath / f'Data/MODISvLandsat_{surf_temp}_{dfloc}_20250500.csv'
# lsat_mod_df.to_csv(out_df, index=False)
Calculate calibration bias and trendΒΆ
Uses a RANSAC regression, but provides comparisons to an Ordinary Least Squares regression calculation from statsmodel of the same parameters.
# Read in paired MODIS/Landsat data created above
surf_temp = 'SST'
if surf_temp=='LST':
thresh = -3.5 # -3.4 is ok too
pix_thresh = 1300
elif surf_temp=='SST':
thresh = -3.1
pix_thresh = 1300
mod_sst_thresh = -1.9
# For Cosgrove region
if surf_temp=='LST':
out_df = '/home/jovyan/landsatproduct-cookbook/Data/MODISvLandsat_LST_Cosgrove.csv'
df1 = pd.read_csv(out_df)
lsat_mod_df_C = df1
else:
out_df = '/home/jovyan/landsatproduct-cookbook/Data/MODISvLandsat_SST_Cosgrove_lin_scale.csv'
df1 = pd.read_csv(out_df)
lsat_mod_df_C = df1
print(f'Original # matchups at Cosgrove: {lsat_mod_df_C[lsat_mod_df_C.MODIS_SST.notna()].shape[0]}')
lsat_mod_df_C = lsat_mod_df_C[lsat_mod_df_C['MODIS_SST']>=mod_sst_thresh]
lsat_mod_df_C = lsat_mod_df_C[lsat_mod_df_C['MODIS_pix']>=pix_thresh]
lsat_mod_df_C = lsat_mod_df_C[lsat_mod_df_C[f'L8_{surf_temp}']>=thresh]
# For Dotson polynya region
if surf_temp=='LST':
out_df = '/home/jovyan/landsatproduct-cookbook/Data/MODISvLandsat_LST_Dotson.csv'
df5 = pd.read_csv(out_df)
lsat_mod_df_D = df5
else:
out_df = '/home/jovyan/landsatproduct-cookbook/Data/MODISvLandsat_SST_Dotson_lin_scale.csv'
df5 = pd.read_csv(out_df)
lsat_mod_df_D = df5
print(f'Original # matchups at Dotson: {lsat_mod_df_D[lsat_mod_df_D.MODIS_SST.notna()].shape[0]}')
lsat_mod_df_D = lsat_mod_df_D[lsat_mod_df_D['MODIS_SST']>=mod_sst_thresh]
lsat_mod_df_D = lsat_mod_df_D[lsat_mod_df_D['MODIS_pix']>=pix_thresh]
lsat_mod_df_D = lsat_mod_df_D[lsat_mod_df_D[f'L8_{surf_temp}']>=thresh]
# For Burke region
if surf_temp=='LST':
out_df = '/home/jovyan/landsatproduct-cookbook/Data/MODISvLandsat_LST_Burke.csv'
df9 = pd.read_csv(out_df)
lsat_mod_df_B = df9
else:
out_df = '/home/jovyan/landsatproduct-cookbook/Data/MODISvLandsat_SST_Burke_lin_scale.csv'
df9 = pd.read_csv(out_df)
lsat_mod_df_B = df9
print(f'Original # matchups at Burke: {lsat_mod_df_B[lsat_mod_df_B.MODIS_SST.notna()].shape[0]}')
lsat_mod_df_B = lsat_mod_df_B[lsat_mod_df_B['MODIS_SST']>=mod_sst_thresh]
lsat_mod_df_B = lsat_mod_df_B[lsat_mod_df_B['MODIS_pix']>=pix_thresh]
lsat_mod_df_B = lsat_mod_df_B[lsat_mod_df_B[f'L8_{surf_temp}']>=thresh]
# Concatenate data from both regions
lsat_mod_df_n = pd.concat([lsat_mod_df_B,lsat_mod_df_C,lsat_mod_df_D])
lsat_mod_df_bc = pd.concat([lsat_mod_df_B,lsat_mod_df_C])
lsat_mod_df_cd = pd.concat([lsat_mod_df_C,lsat_mod_df_D])
lsat_mod_df_bd = pd.concat([lsat_mod_df_B,lsat_mod_df_D])
print(f'Num. good matchups at Cosgrove: {lsat_mod_df_C[lsat_mod_df_C.MODIS_SST.notna()].shape[0]}, at Dotson: {lsat_mod_df_D[lsat_mod_df_D.MODIS_SST.notna()].shape[0]}, at Burke: {lsat_mod_df_B[lsat_mod_df_B.MODIS_SST.notna()].shape[0]}')
# sum_n = lsat_mod_df_n[lsat_mod_df_n.DateTime.dt.month.isin([9,10,11,12,1,3])]
# shld_n = lsat_mod_df_n[lsat_mod_df_n.DateTime.dt.month.isin([2])]
# sum_c = lsat_mod_df_C[lsat_mod_df_C.DateTime.dt.month.isin([9,10,11,12,1,3])]
# sum_bc = lsat_mod_df_bc[lsat_mod_df_bc.DateTime.dt.month.isin([9,10,11,12,1,3])]
# shld_bc = lsat_mod_df_bc[lsat_mod_df_bc.DateTime.dt.month.isin([2])]
# sum_bd = lsat_mod_df_bd[lsat_mod_df_bd.DateTime.dt.month.isin([9,10,11,12,1,3])]
# shld_bd = lsat_mod_df_bd[lsat_mod_df_bd.DateTime.dt.month.isin([2])]
# sum_cd = lsat_mod_df_cd[lsat_mod_df_cd.DateTime.dt.month.isin([9,10,11,12,1,3])]
# shld_cd = lsat_mod_df_cd[lsat_mod_df_cd.DateTime.dt.month.isin([2])]
# # ***check all the february images to see if we need to add a threshold of 2.0C or cut images or if the issue is clouds???
# # shld
# look = lsat_mod_df_n.sort_values('L8_filename')
# look.head(20)
# Orthoganal Regression
if surf_temp=='LST':
data0 = lsat_mod_df_n
landsat_uncertainty = lst_uncertainty
else:
data0 = lsat_mod_df_n
landsat_uncertainty = sst_uncertainty
# Original data
x_original = np.array(data0[f'L8_{surf_temp}'])
y_original = np.array(data0['MODIS_SST'])
# Assume these are your uncertainty estimates per observation
sy = np.full_like(y_original, modis_uncertainty * pix_uncertainty) # Adjusted for resolution mismatch
sx = np.full_like(x_original, landsat_uncertainty)
# Define a linear function for the model
def linear_model(p, x):
return p[0] * x + p[1]
# Create a Model
linear = Model(linear_model)
# Create a RealData object using your DataFrame
data = RealData(x_original, y_original,sx=sx, sy=sy)
# Set up ODR with the model and data
odr = ODR(data, linear, beta0=[1., 0.])
# Run the regression
out = odr.run()
# Use the output
beta = out.beta
beta_err = out.sd_beta
# Print the summary
out.pprint()
# Predicting values using the ODR model
y_pred = linear_model(beta, x_original)
# Get R2
# Calculate Total Sum of Squares (SST)
y_mean = np.mean(y_original)
SST = np.sum((y_original - y_mean)**2)
# Calculate Residual Sum of Squares (SSR)
SSR = np.sum((y_original - y_pred)**2)
# Compute RMSE
rmse = np.sqrt(((y_original - y_pred) ** 2).mean())
# Calculate R^2
R2 = 1 - (SSR / SST)
print("R^2:", np.around(R2,2))
print(f"RMSE: {np.around(rmse,2)}")
if surf_temp=='LST':
# Plot regression
beta_mdn = [beta[0]-beta_err[0]*1.96,beta[1]-beta_err[1]*1.96]
beta_mup = [beta[0]+beta_err[0]*1.96,beta[1]-beta_err[1]*1.96]
beta_bdn = [beta[0]-beta_err[0]*1.96,beta[1]+beta_err[1]*1.96]
beta_bup = [beta[0]+beta_err[0]*1.96,beta[1]+beta_err[1]*1.96]
print(f'At 95% confidence interval: {np.around(beta[0],2)}+/-{np.around(beta_err[0]*1.96,2)}, {np.around(beta[1],2)}+/-{np.around(beta_err[1]*1.96,2)}, n={y_pred.shape[0]}')
xfill = np.array([-4.5,1.5])
fig, ax = plt.subplots(figsize=(8, 3.5))
ax.tick_params(labelsize=14)
# LST data and regression
plt.scatter(x_original, y_original, s=12,color='mediumslateblue')
plt.plot(x_original, y_pred, color='mediumslateblue', label='LST ODR')
plt.fill_between(xfill, linear_model(beta_bdn, xfill), linear_model(beta_mup, xfill),alpha=0.3, facecolor='0.3')
# Comparison regressions
xi = np.arange(-7.0,5.0,1.0)
plt.plot(xi,xi * sstcalib_m + sstcalib_b,color='k',linewidth=2,label='SST ODR')
plt.plot(xi,xi,color='lightcoral',linewidth=2, label='MODIS 1:1')
plt.legend(loc='lower right',fontsize=14)
plt.text(-2.7,-0.15,rf'$\mathbf{{y={np.around(beta[0],2)}x+{np.around(beta[1],2)}\quad r^2={np.around(R2,2)}}}$',color='mediumslateblue', fontweight='bold',fontsize=14)
plt.xlim([-3.5,-1.2])
plt.ylim([-3.05,0.8])
# else:
# plt.plot(x_original, y_pred, color='k', label='NLSST Orthogonal Distance Regression')
# plt.legend(loc='lower right',fontsize=12)
# plt.text(-2.6,-0.2,f'y={np.around(beta[0],2)}x+{np.around(beta[1],2)} $r^2$={np.around(R2,2)}',fontsize=14)
# plt.xlim([-3.2,-0.15])
# plt.ylim([-2.4,0.9])
plt.xlabel('Landsat ST [Β°C]',fontsize=16)
plt.ylabel('MODIS SST [Β°C]',fontsize=16)
plt.tight_layout()
# Orthoganal Regression for SST only
dataframes = [
('Combined', lsat_mod_df_n),
('Burke', lsat_mod_df_B),
('Cosgrove', lsat_mod_df_C),
('Dotson', lsat_mod_df_D),
]
if surf_temp=='LST':
landsat_uncertainty = lst_uncertainty
else:
landsat_uncertainty = sst_uncertainty
# Dictionary to store the results from each DataFrame
odr_results = {}
# Define a linear function for the model
def linear_model(p, x):
return p[0] * x + p[1]
# Create a Model object
linear = Model(linear_model)
# Loop over each DataFrame
for df_name, data0 in dataframes:
print(f"\n=== Processing {df_name} ===")
# Original data
x_original = np.array(data0[f'L8_{surf_temp}'])
y_original = np.array(data0['MODIS_SST'])
# Assume these are your uncertainty estimates per observation
sy = np.full_like(y_original, modis_uncertainty * pix_uncertainty) # Adjusted for resolution mismatch
sx = np.full_like(x_original, landsat_uncertainty)
# Create a RealData object using your DataFrame
data = RealData(x_original, y_original,sx=sx, sy=sy)
# Set up ODR with the model and data, providing an initial guess
odr = ODR(data, linear, beta0=[1., 0.])
# Run the regression
out = odr.run()
# Retrieve best-fit parameters and their std. dev.
beta = out.beta
beta_err = out.sd_beta
# Print the summary
out.pprint()
# Predicting values using the ODR model
y_pred = linear_model(beta, x_original)
# Get R2
# Calculate Total Sum of Squares (SST)
y_mean = np.mean(y_original)
SST = np.sum((y_original - y_mean)**2)
# Calculate Residual Sum of Squares (SSR)
SSR = np.sum((y_original - y_pred)**2)
# Calculate R^2
R2 = 1 - (SSR / SST)
# Compute RMSE
rmse = np.sqrt(((y_original - y_pred) ** 2).mean())
# Print R^2
print(f"{df_name} R^2:", np.around(R2, 2))
print(f"RMSE: {np.around(rmse,2)}")
# Store results in a dictionary for later use (plotting, etc.)
odr_results[df_name] = {
'beta': beta,
'beta_err': beta_err,
'R2': R2,
'x_original': x_original,
'y_original': y_original,
'y_pred': y_pred
}
# Import the necessary ipywidgets components
import ipywidgets as widgets
from ipywidgets import interact
# The @interact decorator will automatically create widgets for the function arguments
@interact(
show_burke=widgets.Checkbox(value=True, description="Burke Data"),
show_cosgrove=widgets.Checkbox(value=True, description="Cosgrove Data"),
show_dotson=widgets.Checkbox(value=True, description="Dotson Data"),
show_combined_fit=widgets.Checkbox(value=True, description="Combined Fit"),
show_confidence_interval=widgets.Checkbox(value=True, description="95% CI")
)
def interactive_sst_plot(show_burke, show_cosgrove, show_dotson, show_combined_fit, show_confidence_interval):
# This is your original code, with plotting commands wrapped in if-statements
if surf_temp=='SST':
# Your original data calculation code (unchanged)
beta_mdn = [odr_results['Combined']['beta'][0]-odr_results['Combined']['beta_err'][0]*1.96,odr_results['Combined']['beta'][1]-odr_results['Combined']['beta_err'][1]*1.96]
beta_mup = [odr_results['Combined']['beta'][0]+odr_results['Combined']['beta_err'][0]*1.96,odr_results['Combined']['beta'][1]-odr_results['Combined']['beta_err'][1]*1.96]
beta_bdn = [odr_results['Combined']['beta'][0]-odr_results['Combined']['beta_err'][0]*1.96,odr_results['Combined']['beta'][1]+odr_results['Combined']['beta_err'][1]*1.96]
beta_bup = [odr_results['Combined']['beta'][0]+odr_results['Combined']['beta_err'][0]*1.96,odr_results['Combined']['beta'][1]+odr_results['Combined']['beta_err'][1]*1.96]
a1 = np.around(odr_results['Combined']['beta'][0],2)
a2 = np.around(odr_results['Combined']['beta'][1],2)
ar = np.around(odr_results['Combined']['R2'],2)
b1 = np.around(odr_results['Burke']['beta'][0],2)
b2 = np.around(odr_results['Burke']['beta'][1],2)
br = np.around(odr_results['Burke']['R2'],2)
c1 = np.around(odr_results['Cosgrove']['beta'][0],2)
c2 = np.around(odr_results['Cosgrove']['beta'][1],2)
cr = np.around(odr_results['Cosgrove']['R2'],2)
d1 = np.around(odr_results['Dotson']['beta'][0],2)
d2 = np.around(odr_results['Dotson']['beta'][1],2)
dr = np.around(odr_results['Dotson']['R2'],2)
xfill = np.array([-4.3,0.9])
# --- Plotting Section ---
fig, ax = plt.subplots(figsize=(8, 4)) # Increased height slightly for better text placement
ax.tick_params(labelsize=14)
if show_burke:
plt.scatter(np.array(lsat_mod_df_B[f'L8_{surf_temp}']), np.array(lsat_mod_df_B['MODIS_SST']), s=12,color='#00bf7d', label='Burke')
plt.plot(odr_results['Burke']['x_original'], odr_results['Burke']['y_pred'], ls='-',linewidth=1,color='#00bf7d')
plt.text(-2.9,0.0,f'y={b1}x+{b2} $r^2$={br}',color='#00bf7d',fontsize=12)
if show_cosgrove:
plt.scatter(np.array(lsat_mod_df_C[f'L8_{surf_temp}']), np.array(lsat_mod_df_C['MODIS_SST']), s=12,color=sns.color_palette("colorblind")[3], label='Cosgrove')
plt.plot(odr_results['Cosgrove']['x_original'], odr_results['Cosgrove']['y_pred'], ls='-',linewidth=1,color=sns.color_palette("colorblind")[3])
plt.text(-2.9,-0.3,f'y={c1}x+{c2} $r^2$={cr}',color=sns.color_palette("colorblind")[3],fontsize=12)
if show_dotson:
plt.scatter(np.array(lsat_mod_df_D[f'L8_{surf_temp}']), np.array(lsat_mod_df_D['MODIS_SST']), s=12,color='#0073e6', label='Dotson')
plt.plot(odr_results['Dotson']['x_original'], odr_results['Dotson']['y_pred'], ls='-',linewidth=1,color='#0073e6')
plt.text(-2.9,-0.6,f'y={d1}x+{d2} $r^2$={dr}',color='#0073e6',fontsize=12)
if show_combined_fit:
plt.plot(odr_results['Combined']['x_original'], odr_results['Combined']['y_pred'], color='k')
plt.text(-2.9,0.3,f'y={a1}x+{a2} $r^2$={ar}',color='k',fontsize=14)
if show_confidence_interval:
plt.fill_between(xfill, linear_model(beta_bdn, xfill), linear_model(beta_mup, xfill),alpha=0.3, facecolor='0.3')
# General plot labels and limits (unchanged)
plt.xlim([-3.05,-0.15])
plt.ylim([-2.3,0.9])
plt.xlabel('Landsat SST [Β°C]',fontsize=16)
plt.ylabel('MODIS SST [Β°C]',fontsize=16)
plt.legend(loc='lower right',fontsize=10)
plt.tight_layout()
plt.show() # Make sure to show the plot
odr_results['Combined']['beta_err']
if surf_temp=='SST':
# Ordinary least squares regression between Landsat and MODIS SST matchups
resultC = sm.ols(formula="MODIS_SST ~ L8_SST", data=data0).fit()
print (resultC.summary())
Interactive Analysis Dashboard using MatplotlibΒΆ
# Imports for Interactive Dashboard
import ipywidgets as widgets
from ipywidgets import interact, Layout
# Matplotlib-based Interactive Dashboard Function
def linear_model(p, x):
return p[0] * x + p[1]
def interactive_calibration_dashboard_matplotlib(region, chart_type, pix_thresh, modis_sst_thresh, lsat_sst_thresh):
# 1. & 2. Data filtering and selection (Same as your original code)
df_C_filt = lsat_mod_df_C[(lsat_mod_df_C['MODIS_pix'] >= pix_thresh) & (lsat_mod_df_C['MODIS_SST'] >= modis_sst_thresh) & (lsat_mod_df_C[f'L8_{surf_temp}'] >= lsat_sst_thresh)]
df_C_filt['Region'] = 'Cosgrove'
df_D_filt = lsat_mod_df_D[(lsat_mod_df_D['MODIS_pix'] >= pix_thresh) & (lsat_mod_df_D['MODIS_SST'] >= modis_sst_thresh) & (lsat_mod_df_D[f'L8_{surf_temp}'] >= lsat_sst_thresh)]
df_D_filt['Region'] = 'Dotson'
df_B_filt = lsat_mod_df_B[(lsat_mod_df_B['MODIS_pix'] >= pix_thresh) & (lsat_mod_df_B['MODIS_SST'] >= modis_sst_thresh) & (lsat_mod_df_B[f'L8_{surf_temp}'] >= lsat_sst_thresh)]
df_B_filt['Region'] = 'Burke'
if region == 'All Regions':
df_filtered = pd.concat([df_B_filt, df_C_filt, df_D_filt])
elif region == 'Cosgrove':
df_filtered = df_C_filt
elif region == 'Dotson':
df_filtered = df_D_filt
elif region == 'Burke':
df_filtered = df_B_filt
# Create the figure and axis for the plot
fig, ax = plt.subplots(figsize=(8, 6))
if len(df_filtered) < 2:
ax.text(0.5, 0.5, "Not enough data with the current filter settings.", ha='center', va='center')
ax.set_axis_off()
plt.show()
return
# 3. ODR Calculation (Same as your original code)
x_data = df_filtered[f'L8_{surf_temp}'].values
y_data = df_filtered['MODIS_SST'].values
sx = np.full_like(x_data, sst_uncertainty)
sy = np.full_like(y_data, modis_uncertainty * pix_uncertainty)
model = Model(linear_model)
data = RealData(x_data, y_data, sx=sx, sy=sy)
odr = ODR(data, model, beta0=[1., 0.])
output = odr.run()
slope, intercept = output.beta
y_pred = linear_model(output.beta, x_data)
# 4. Statistics Calculation (Same as your original code, but formatted for Matplotlib title)
bias = np.mean(y_data - x_data)
trend = slope
ss_total = np.sum((y_data - np.mean(y_data))**2)
ss_resid = np.sum((y_data - y_pred)**2)
r2 = 1 - (ss_resid / ss_total) if ss_total > 0 else 0
rmse = np.sqrt(np.mean((y_data - y_pred)**2))
# Note: Matplotlib title doesn't support HTML bold tags, so they are removed.
stats_text = (f"Trend (Slope): {trend:.2f} | Bias (MODIS - Landsat): {bias:.2f}Β°C\n"
f"RΒ²: {r2:.2f} | RMSE: {rmse:.2f} | N: {len(df_filtered)}")
# 5. Create Matplotlib figure based on chart_type
if chart_type == 'Scatter':
# Seaborn's scatterplot is a great way to handle coloring by category easily
sns.scatterplot(data=df_filtered, x=f'L8_{surf_temp}', y='MODIS_SST', hue='Region', ax=ax, s=50)
elif chart_type == 'Heatmap':
ax.hist2d(df_filtered[f'L8_{surf_temp}'], df_filtered['MODIS_SST'], bins=20, cmap='viridis')
fig.colorbar(ax.collections[0], ax=ax, label='Point Density')
# Add regression line
sorted_indices = np.argsort(x_data)
ax.plot(x_data[sorted_indices], y_pred[sorted_indices], color='black', linewidth=2, label='ODR Fit')
# Set titles, labels, and limits
ax.set_title(stats_text)
ax.set_xlabel("Landsat SST [Β°C]")
ax.set_ylabel("MODIS SST [Β°C]")
ax.set_xlim(-3.05, -0.15)
ax.set_ylim(-2.3, 0.9)
ax.legend()
ax.grid(True, linestyle='--', alpha=0.6)
plt.show()
# Define Widgets
style = {'description_width': 'initial'}
w_region = widgets.Dropdown(options=['All Regions', 'Cosgrove', 'Dotson', 'Burke'], value='All Regions', description='Select Region:', style=style)
w_chart_type = widgets.ToggleButtons(options=['Scatter', 'Heatmap'], description='Chart Type:', button_style='info')
w_pix_thresh = widgets.IntSlider(value=1300, min=0, max=5000, step=100, description='Min MODIS Pixels:', style=style, layout=Layout(width='500px'))
w_modis_sst = widgets.FloatSlider(value=-1.9, min=-2.5, max=0, step=0.1, description='Min MODIS SST (Β°C):', style=style, layout=Layout(width='500px'))
w_lsat_sst = widgets.FloatSlider(value=-3.1, min=-4.0, max=0, step=0.1, description='Min Landsat SST (Β°C):', style=style, layout=Layout(width='500px'))
# Launch the new Matplotlib-based Dashboard
interact(interactive_calibration_dashboard_matplotlib,
region=w_region,
chart_type=w_chart_type,
pix_thresh=w_pix_thresh,
modis_sst_thresh=w_modis_sst,
lsat_sst_thresh=w_lsat_sst);