Calibration - Raw Pixels to Products

🤔 Why Calibrate?¶

Calibration of a physical model is the process of tuning its parameters to ensure its outputs accurately match real-world measurements. It’s about bridging the gap between theoretical calculations and physical reality.

Key Goal: The primary goal is to improve predictive accuracy, transforming a model from a theoretical construct into a reliable tool for analysis and design.

📝 What Do You Need?¶

To calibrate a model, you need four key components:

1. A Physical Model
A set of mathematical equations or simulation software.

2. Tunable Parameters
The specific "knobs" in your model that you can adjust.

3. Experimental Data
High-quality measurements from the real-world system.

4. An Objective Function
A metric that quantifies the error (e.g., RMSE).

⚙️ How to Calibrate?¶

There are two main approaches to calibration: by hand (trial and error) or by using a smart computer search (optimization algorithms). Optimization is highly recommended for its efficiency and accuracy.

✅ How Do You Know It’s Calibrated?¶

A successful calibration can be verified by analyzing the errors and by validating the model against new data. A common way to visualize the result is by plotting the model’s output against the experimental data.

Interactive Example: Adjusting Model Parameters¶

To make this concept interactive, you can run the code cell below. It will create the same plot but with sliders that let you manually adjust the model’s parameters (slope and intercept). Try to move the sliders to make the red line fit the blue dots. This gives you a hands-on feel for the “By Hand” calibration process.

# First, let's install ipywidgets for the interactive sliders
#%pip install -q ipywidgets
#%pip install -q scikit-image
#%pip install seaborn
# %pip install plotly
%pip install xarray==2024.05.0
import earthaccess

Collecting xarray==2024.05.0
  Downloading xarray-2024.5.0-py3-none-any.whl.metadata (11 kB)

Requirement already satisfied: numpy>=1.23 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from xarray==2024.05.0) (2.2.6)
Requirement already satisfied: packaging>=23.1 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from xarray==2024.05.0) (25.0)
Requirement already satisfied: pandas>=2.0 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from xarray==2024.05.0) (2.3.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from pandas>=2.0->xarray==2024.05.0) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from pandas>=2.0->xarray==2024.05.0) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from pandas>=2.0->xarray==2024.05.0) (2025.2)
Requirement already satisfied: six>=1.5 in /home/runner/micromamba/envs/landsat-product-cookbook-dev/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas>=2.0->xarray==2024.05.0) (1.17.0)
Downloading xarray-2024.5.0-py3-none-any.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.2 MB ? eta -:--:--

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 66.2 MB/s  0:00:00

Installing collected packages: xarray
  Attempting uninstall: xarray
    Found existing installation: xarray 2025.7.1
    Uninstalling xarray-2025.7.1:
      Successfully uninstalled xarray-2025.7.1

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
virtualizarr 1.2.0 requires xarray>=2024.10.0, but you have xarray 2024.5.0 which is incompatible.
rioxarray 0.19.0 requires xarray>=2024.7.0, but you have xarray 2024.5.0 which is incompatible.
Successfully installed xarray-2024.5.0

Note: you may need to restart the kernel to use updated packages.

# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact, fixed

# 1. Create the sample "Experimental Data"
np.random.seed(0)
x_data = np.linspace(0, 10, 20)
true_slope = 2.5
true_intercept = 1.5
y_data = true_slope * x_data + true_intercept + np.random.normal(0, 2, size=x_data.shape)

# 2. Define a function to plot the data and our adjustable model
def plot_model(x, y, slope, intercept):
    """Plots the experimental data against the model line defined by slope and intercept."""
    y_model = slope * x + intercept
    
    plt.figure(figsize=(8, 6))
    plt.plot(x, y, 'o', label='Experimental Data', markersize=8, color='royalblue')
    plt.plot(x, y_model, '-', label='Adjustable Model', linewidth=3, color='red')
    
    # Calculate and display the RMSE as our objective function score
    rmse = np.sqrt(np.mean((y - y_model)**2))
    plt.title(f'Comparison of Model vs. Data (RMSE: {rmse:.2f})', fontsize=16)
    
    plt.xlabel('Independent Variable', fontsize=12)
    plt.ylabel('Dependent Variable', fontsize=12)
    plt.ylim(min(y_data)-2, max(y_data)+2)
    plt.legend()
    plt.grid(True, linestyle='--', alpha=0.6)
    plt.show()

# 3. Create the interactive plot!
# The 'interact' function automatically creates sliders for the numerical arguments.
interact(plot_model, x=fixed(x_data), y=fixed(y_data), slope=(0.0, 5.0, 0.1), intercept=(-5.0, 5.0, 0.1));

Example: Landsat calibration using MODIS SST¶

Create matchups between Landsat and MODIS SST data near Cosgrove, West Antarctica to produce a calibration for Landsat SSTs

# Import libraries and modules
%config InlineBackend.figure_format = 'svg'
%matplotlib widget

%load_ext autoreload
%autoreload 2

import pandas as pd
import xarray as xr
import geopandas as gpd
from datetime import date, timedelta, datetime
import numpy as np
import matplotlib.pylab as plt
from matplotlib import colors
from matplotlib.pylab import rcParams
from matplotlib.patches import Polygon as Pgon
import cartopy.crs as ccrs
import cartopy

import os
from cycler import cycler
import pyproj
from pyproj import Proj, transform
from sklearn.neighbors import BallTree
import pytz
import pygmt
import gc
import copy
import random
import statsmodels.formula.api as sm
import scipy.stats as stats
from sklearn.linear_model import RANSACRegressor
from sklearn.linear_model import LinearRegression
from shapely.geometry import Point, Polygon
from pathlib import Path
import math
from scipy.odr import Model, RealData, ODR
from tqdm.notebook import trange, tqdm
import seaborn as sns

import earthaccess

# For LST file masking
import pystac_client
import intake
from rasterio.session import AWSSession
import boto3

import SSTutils as stu

import warnings
warnings.filterwarnings('ignore')

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[3], line 49
     46 from rasterio.session import AWSSession
     47 import boto3
---> 49 import SSTutils as stu
     51 import warnings
     52 warnings.filterwarnings('ignore')

ModuleNotFoundError: No module named 'SSTutils'

#For color cycling in plots that is color blind friendly...make new ones at "I want hue" tools.medialab.sciences-po.fr/iwanthue
color_cycler = cycler(color=["#6777cf","#adba49","#c65ca0","#5fa042","#683287","#72ce7b","#c44a48","#45c7a9","#933c1d","#d0803f","#ac9239","#317c39"])
colorline_cycler = (cycler(color=["#75a141","#6c61b9","#bc4d45","#c1913d","#b85298","#4aa8e8"]) +
                 cycler(linestyle=['-','--',':','-.','-','--']))
rcParams['axes.prop_cycle'] = cycler('color', color_cycler)

Build Landsat - MODIS SST matchups¶

# Set paths and important variables and Calibration region bounding box

basepath = Path('/home/jovyan/landsatproduct-cookbook')
spacing = [990,-990] # 990m sampling of MODIS data so that upsampling is easy and because 30m takes far too long
param = 'sea_surface_temperature'
size_threshold = 30

location = 0             # 0 and 1 are the Cosgrove and Dotson Polynya calibration areas, respectively
surf_temp = 'SST'        # 'SST' and 'LST' are for the Landsat SST and LST algorithms respectively

# Set location bounds
if location==1:
    pathdir = 'DotsonPolynya'
    latboundsC = [ -73.9 , -73.5 ] # Dotson polynya
    lonboundsC = [ -113 , -111.5 ]
    dfloc = 'Dotson'
elif location==0:
    pathdir = 'Cosgrove'
    latboundsC = [ -73.5 , -73.42 ] # near Cosgrove
    lonboundsC = [ -103.0 , -102.0 ]
    dfloc = 'Cosgrove'
elif location==2:
    pathdir = 'Burke'
    latboundsC = [ -73.81 , -73.42 ] # south of Burke
    lonboundsC = [ -104.2 , -103.8 ]
    dfloc = 'Burke'
if location==3:
    pathdir = 'DotsonIntercomp'
    latboundsC = [ -74.2 , -74.11 ] # Dotson plume for intercomparison
    lonboundsC = [ -113.5 , -113.17 ]
    dfloc = 'DotsonIntercomp'

# Coefficients for calibration
# SST
sstcalib_m = 0.76 
sstcalib_b = 0.55 

# LST
lstcalib_m = 0.80
lstcalib_b = 1.00

modmin = -1.9
LSTmin = np.around(modmin/lstcalib_m - lstcalib_b,2) 
SSTmin = np.around(modmin/sstcalib_m - sstcalib_b,2) # should be about -2.0

# Uncertainties used in ODR propagation of error
modis_uncertainty = 0.44  # long wave sst ocean color atbd
sst_uncertainty = 0.3 # USGS Landsat stray light notice
lst_uncertainty = 1.0 # gerace 2020
pix_uncertainty = np.sqrt((1000*1000)/(100*100)) # MODIS 1km x 1km and Landsat 100m x 100m


# For calibrated SST runs
if surf_temp=='SST':
    if location==3:
        sstpath = basepath / f'Data/SST/Validation/{pathdir}/'
    else:
        sstpath = basepath / f'Data/SST/MODcalib/{pathdir}/'
    tif = 'tif'
    thresh = SSTmin
    calib_m = sstcalib_m
    calib_b = sstcalib_b
    
# If running for LST comparisons
elif surf_temp=='LST':
    if location==3:
        sstpath = basepath / f'Data/SST/LST/Calibration/DotsonPolynya/'
    else:
        sstpath = basepath / f'Data/SST/LST/Calibration/{pathdir}/'
    tif = 'TIF'
    thresh = LSTmin
    calib_m = lstcalib_m
    calib_b = lstcalib_b

# Authenticate for accessing NASA data (MODIS)
auth = earthaccess.login(strategy="interactive")

# If we are not authenticated
if not auth.authenticated:
    # ask for credentials and persist them in a .netrc file
    auth.login(strategy="interactive", persist=True)

# Convert bounding box to south polar stereo for checking if landsat has any data in bounding box
# Speeds up process a lot
source_crs = 'epsg:4326' 
target_crs = 'epsg:3031' # Coordinate system of the file   

bbox,checkbox = stu.lsat_reproj(source_crs,target_crs,(lonboundsC[0],latboundsC[0],lonboundsC[1],latboundsC[1]))

# Create polygon for later cropping
polygon = Polygon([(bbox[0][0],bbox[0][1]),(bbox[3][0],bbox[3][1]),(bbox[2][0],bbox[2][1]),(bbox[1][0],bbox[1][1])])

# Create min/max boundaries for trimming image before crop_xarray to cut down on processing times
minx, miny, maxx, maxy = polygon.bounds
polarx = [minx, maxx]
polary = [miny, maxy]

# Get Landsat file paths in directory
lsatfiles = os.listdir(sstpath)
lsatfiles = [x for x in lsatfiles if x[-3:] == tif]
lsatfiles.sort()
print (len(lsatfiles))
os.chdir(sstpath)

SSTfails = [
 'LC08_L1GT_006112_20221024_20221107_02_T2_152257_Cel.tif',
 'LC08_L1GT_006112_20221024_20221107_02_T2_152257_Cel.tif',
]
LSTfails = [
 'LC08_L1GT_007112_20211215_20211223_02_T2_SW_LST.TIF',
 'LC08_L1GT_007112_20220201_20220211_02_T2_SW_LST.TIF',
 'LC08_L1GT_007112_20221202_20221212_02_T2_SW_LST.TIF',
 'LC08_L1GT_007112_20230204_20230209_02_T2_SW_LST.TIF',
 'LC08_L1GT_008113_20221022_20221101_02_T2_SW_LST.TIF',
 'LC08_L1GT_010112_20221020_20221101_02_T2_SW_LST.TIF',
 'LC08_L1GT_010113_20221020_20221101_02_T2_SW_LST.TIF',
 'LC08_L1GT_012112_20221018_20221031_02_T2_SW_LST.TIF',
 'LC08_L1GT_008113_20221022_20221101_02_T2_SW_LST.TIF',
 'LC08_L1GT_010112_20221020_20221101_02_T2_SW_LST.TIF',
 'LC08_L1GT_010113_20220105_20220114_02_T2_SW_LST.TIF',
 'LC08_L1GT_010113_20220121_20220128_02_T2_SW_LST.TIF',
 'LC08_L1GT_010113_20221020_20221101_02_T2_SW_LST.TIF',
 'LC08_L1GT_010113_20221105_20221115_02_T2_SW_LST.TIF',
 'LC08_L1GT_010113_20230108_20230124_02_T2_SW_LST.TIF',
 'LC08_L1GT_010113_20230313_20230321_02_T2_SW_LST.TIF',
 'LC08_L1GT_012112_20221018_20221031_02_T2_SW_LST.TIF'
]

Search for desired Landsat scenes¶

# Authenticate for boto S3 access, etc.
os.environ["AWS_REQUEST_PAYER"] = "requester"
aws_session = AWSSession(boto3.Session(), requester_pays=True)

# Setup and authenticate 
from dask.distributed import Client
import logging
client = Client(processes=True, n_workers=4, 
                threads_per_worker=1,
                silence_logs=logging.ERROR)
client.run(lambda: os.environ["AWS_REQUEST_PAYER"] == "requester" )
client

# Define the landsat STAC catalog location
url = 'https://landsatlook.usgs.gov/stac-server'
collection = 'landsat-c2l1' # Landsat Collection 2, Level 1 - includes L8 and L9

i=0
ls_scene = xr.open_dataset(lsatfiles[i],chunks=dict(x=512, y=512),engine='rasterio')['band_data'].sel(band=1)
ls_scene

%%time
# ~1 min 32 sec per image
# If number of MODIS images per satellite is much more than 25, its because there is a ULY,LRY issue
os.chdir('/home/jovyan/landsatproduct-cookbook/Data/SST/MODcalib/Cosgrove/')

lsat_mod = []
#for i in tqdm(range(len(lsatfiles)), desc="Processing"):
for i in tqdm(range(1), desc="Processing the first image"):
    # Check for known repeatedly bad files that will kill the code
    if surf_temp == 'SST':    
        if lsatfiles[i] in SSTfails:
            continue
    elif surf_temp == 'LST':
        if lsatfiles[i] in LSTfails:
            continue
        
    # Concatenate all landsat files into xarray with a time dimension
    ls_scene = xr.open_dataset(lsatfiles[i],chunks=dict(x=512, y=512),engine='rasterio')['band_data'].sel(band=1)
    ls_scene = ls_scene.rio.write_crs("epsg:3031", inplace=True)
    print("1")
    if surf_temp == 'SST':
        times = pd.to_datetime(lsatfiles[i][17:25]+lsatfiles[i][41:47], format='%Y%m%d%H%M%S')
        ls_scene = ls_scene.assign_coords(time=times,ID=lsatfiles[i][:-8])
    elif surf_temp == 'LST':
        # Need to mask LST because not done previously
        mask = stu.get_lst_mask(lsatfiles[i])
        ls_scene = ls_scene * mask
        
        times = pd.to_datetime(lsatfiles[i][17:25]+'120000', format='%Y%m%d%H%M%S')
        ls_scene = ls_scene.assign_coords(time=times,ID=lsatfiles[i][:-4])
        ls_scene = ls_scene - 273.15

    # Subset scene and check that it has the correct dimensions because y order changes sometimes
    ls_scene = stu.subset_img(ls_scene,polarx,polary) # subset so easier to work with
    ls_scene = stu.crop_xarray_dataarray_with_polygon(ls_scene, polygon) # crop data to exact bounding box

    # if location==3:
    # # Calibrate
    # ls_scene = ls_scene * calib_m + calib_b
    
    # # Remove SSTs that are unrealistically cool
    # ls_scene = ls_scene.where(ls_scene >= thresh, np.nan)

    lsID = lsatfiles[i]
    print (lsID)
    
    # Take mean temp, will skip the modis stage if no Landsat data in the calibration region
    try:
        lsat = np.nanmean(ls_scene)
        ls_num = ls_scene.notnull().values.sum()
    except Exception as e:
        print (lsID, e)
        lsat = np.nan

    print(lsat)
    if ~np.isfinite(lsat):
        continue

    # Find coincident MODIS SST scene
    mod_scene, mod_file,time_dif = stu.find_MODIS(lonboundsC,latboundsC,ls_scene)
    print(mod_file)

    try:
        # Acquire and align MODIS data to Landsat

        # To subset to only high quality MODIS temp measurements which doesn't seem to be useful
        # print(mod_scene.quality_level.max().values)
        # mod_temps = mod_scene.sea_surface_temperature.where(mod_scene.quality_level>=4)

        MODsst_xr = stu.get_sst(ls_scene,mod_scene.sea_surface_temperature,spacing,param) #mod_scene.sea_surface_temperature

        # Remove SSTs that are unrealistically cool
        MODsst_xr = MODsst_xr.where(MODsst_xr >= -1.9, np.nan)

        # Crop Landsat image to meet the slightly smaller MODIS image (smaller image results from upsample methods in get_wv2)
        ls_scene = stu.subset_img(ls_scene,[MODsst_xr.x.min(),MODsst_xr.x.max()],[MODsst_xr.y.min(),MODsst_xr.y.max()])

        # Only use MODIS data where cropped Landsat data is also available
        MODsst_xr_sub = MODsst_xr.where(ls_scene.notnull(),np.nan)

        # Take mean temp
        modis = np.nanmean(MODsst_xr_sub)
        MOD_num = MODsst_xr_sub.notnull().values.sum()
    except Exception as e:
        print (mod_file, e)
        modis = np.nan
        MOD_num = 0

    # Take mean using Landsat data only where cropped MODIS data is also available (need to do both)
    try:
        ls_scene_sub = ls_scene.where(MODsst_xr_sub.notnull(),np.nan)
        lsat = np.nanmean(ls_scene_sub)
        ls_num = ls_scene_sub.notnull().values.sum()
    except Exception as e:
        print (lsID, e)
        lsat = np.nan

    # Append file names with SST means from the Cosgrove box
    lsat_mod.append([times,mod_file,modis,MOD_num,lsID,lsat,ls_num,time_dif])
    print (f'MODIS mean: {modis}, Landsat 8: {lsat}')

    try:
        del ls_scene, ls_sub, mod_scene, MODsst_xr, MODsst_xr_sub
    except:
        pass

    gc.collect()

# # Put data into DataFrame and save    
# headers = ['DateTime','MODIS_filename','MODIS_SST','MODIS_pix','L8_filename',f'L8_{surf_temp}','L8_pix','time_dif']
# lsat_mod_df = pd.DataFrame(lsat_mod,columns=headers)
# out_df = basepath / f'Data/MODISvLandsat_{surf_temp}_{dfloc}_20250500.csv'
# lsat_mod_df.to_csv(out_df, index=False)

Calculate calibration bias and trend¶

Uses a RANSAC regression, but provides comparisons to an Ordinary Least Squares regression calculation from statsmodel of the same parameters.

# Read in paired MODIS/Landsat data created above
surf_temp = 'SST'

if surf_temp=='LST':
    thresh = -3.5 # -3.4 is ok too
    pix_thresh = 1300
elif surf_temp=='SST':
    thresh = -3.1
    pix_thresh = 1300
mod_sst_thresh = -1.9

# For Cosgrove region
if surf_temp=='LST':
    out_df = '/home/jovyan/landsatproduct-cookbook/Data/MODISvLandsat_LST_Cosgrove.csv'
    df1 = pd.read_csv(out_df)
    lsat_mod_df_C = df1
else:   
    out_df = '/home/jovyan/landsatproduct-cookbook/Data/MODISvLandsat_SST_Cosgrove_lin_scale.csv' 
    df1 = pd.read_csv(out_df)
    lsat_mod_df_C = df1

print(f'Original # matchups at Cosgrove: {lsat_mod_df_C[lsat_mod_df_C.MODIS_SST.notna()].shape[0]}')

lsat_mod_df_C = lsat_mod_df_C[lsat_mod_df_C['MODIS_SST']>=mod_sst_thresh]
lsat_mod_df_C = lsat_mod_df_C[lsat_mod_df_C['MODIS_pix']>=pix_thresh]
lsat_mod_df_C = lsat_mod_df_C[lsat_mod_df_C[f'L8_{surf_temp}']>=thresh]

# For Dotson polynya region
if surf_temp=='LST':
    out_df = '/home/jovyan/landsatproduct-cookbook/Data/MODISvLandsat_LST_Dotson.csv'
    df5 = pd.read_csv(out_df)
    lsat_mod_df_D = df5
else:
    out_df = '/home/jovyan/landsatproduct-cookbook/Data/MODISvLandsat_SST_Dotson_lin_scale.csv'  
    df5 = pd.read_csv(out_df)
    lsat_mod_df_D = df5

print(f'Original # matchups at Dotson: {lsat_mod_df_D[lsat_mod_df_D.MODIS_SST.notna()].shape[0]}')

lsat_mod_df_D = lsat_mod_df_D[lsat_mod_df_D['MODIS_SST']>=mod_sst_thresh]
lsat_mod_df_D = lsat_mod_df_D[lsat_mod_df_D['MODIS_pix']>=pix_thresh]
lsat_mod_df_D = lsat_mod_df_D[lsat_mod_df_D[f'L8_{surf_temp}']>=thresh]

# For Burke region
if surf_temp=='LST':
    out_df = '/home/jovyan/landsatproduct-cookbook/Data/MODISvLandsat_LST_Burke.csv'
    df9 = pd.read_csv(out_df)
    lsat_mod_df_B = df9
else:
    out_df = '/home/jovyan/landsatproduct-cookbook/Data/MODISvLandsat_SST_Burke_lin_scale.csv'
    df9 = pd.read_csv(out_df)
    lsat_mod_df_B = df9

print(f'Original # matchups at Burke: {lsat_mod_df_B[lsat_mod_df_B.MODIS_SST.notna()].shape[0]}')

lsat_mod_df_B = lsat_mod_df_B[lsat_mod_df_B['MODIS_SST']>=mod_sst_thresh]
lsat_mod_df_B = lsat_mod_df_B[lsat_mod_df_B['MODIS_pix']>=pix_thresh]
lsat_mod_df_B = lsat_mod_df_B[lsat_mod_df_B[f'L8_{surf_temp}']>=thresh]

# Concatenate data from both regions
lsat_mod_df_n = pd.concat([lsat_mod_df_B,lsat_mod_df_C,lsat_mod_df_D]) 
lsat_mod_df_bc = pd.concat([lsat_mod_df_B,lsat_mod_df_C]) 
lsat_mod_df_cd = pd.concat([lsat_mod_df_C,lsat_mod_df_D])
lsat_mod_df_bd = pd.concat([lsat_mod_df_B,lsat_mod_df_D])
print(f'Num. good matchups at Cosgrove: {lsat_mod_df_C[lsat_mod_df_C.MODIS_SST.notna()].shape[0]}, at Dotson: {lsat_mod_df_D[lsat_mod_df_D.MODIS_SST.notna()].shape[0]}, at Burke: {lsat_mod_df_B[lsat_mod_df_B.MODIS_SST.notna()].shape[0]}')

# sum_n = lsat_mod_df_n[lsat_mod_df_n.DateTime.dt.month.isin([9,10,11,12,1,3])]
# shld_n = lsat_mod_df_n[lsat_mod_df_n.DateTime.dt.month.isin([2])]
# sum_c = lsat_mod_df_C[lsat_mod_df_C.DateTime.dt.month.isin([9,10,11,12,1,3])]
# sum_bc = lsat_mod_df_bc[lsat_mod_df_bc.DateTime.dt.month.isin([9,10,11,12,1,3])]
# shld_bc = lsat_mod_df_bc[lsat_mod_df_bc.DateTime.dt.month.isin([2])]
# sum_bd = lsat_mod_df_bd[lsat_mod_df_bd.DateTime.dt.month.isin([9,10,11,12,1,3])]
# shld_bd = lsat_mod_df_bd[lsat_mod_df_bd.DateTime.dt.month.isin([2])]
# sum_cd = lsat_mod_df_cd[lsat_mod_df_cd.DateTime.dt.month.isin([9,10,11,12,1,3])]
# shld_cd = lsat_mod_df_cd[lsat_mod_df_cd.DateTime.dt.month.isin([2])]

# # ***check all the february images to see if we need to add a threshold of 2.0C or cut images or if the issue is clouds???
# # shld
# look = lsat_mod_df_n.sort_values('L8_filename')
# look.head(20)

# Orthoganal Regression 
if surf_temp=='LST':
    data0 = lsat_mod_df_n
    landsat_uncertainty = lst_uncertainty
else:
    data0 = lsat_mod_df_n
    landsat_uncertainty = sst_uncertainty

# Original data
x_original = np.array(data0[f'L8_{surf_temp}'])
y_original = np.array(data0['MODIS_SST'])

# Assume these are your uncertainty estimates per observation
sy = np.full_like(y_original, modis_uncertainty * pix_uncertainty)  # Adjusted for resolution mismatch
sx = np.full_like(x_original, landsat_uncertainty)

# Define a linear function for the model
def linear_model(p, x):
    return p[0] * x + p[1]

# Create a Model
linear = Model(linear_model)

# Create a RealData object using your DataFrame
data = RealData(x_original, y_original,sx=sx, sy=sy)

# Set up ODR with the model and data
odr = ODR(data, linear, beta0=[1., 0.])

# Run the regression
out = odr.run()

# Use the output
beta = out.beta
beta_err = out.sd_beta

# Print the summary
out.pprint()

# Predicting values using the ODR model
y_pred = linear_model(beta, x_original)

# Get R2
# Calculate Total Sum of Squares (SST)
y_mean = np.mean(y_original)
SST = np.sum((y_original - y_mean)**2)

# Calculate Residual Sum of Squares (SSR)
SSR = np.sum((y_original - y_pred)**2)

# Compute RMSE
rmse = np.sqrt(((y_original - y_pred) ** 2).mean())

# Calculate R^2
R2 = 1 - (SSR / SST)
print("R^2:", np.around(R2,2))
print(f"RMSE: {np.around(rmse,2)}")

if surf_temp=='LST':
    # Plot regression
    beta_mdn = [beta[0]-beta_err[0]*1.96,beta[1]-beta_err[1]*1.96]
    beta_mup = [beta[0]+beta_err[0]*1.96,beta[1]-beta_err[1]*1.96]
    beta_bdn = [beta[0]-beta_err[0]*1.96,beta[1]+beta_err[1]*1.96]
    beta_bup = [beta[0]+beta_err[0]*1.96,beta[1]+beta_err[1]*1.96]
    print(f'At 95% confidence interval: {np.around(beta[0],2)}+/-{np.around(beta_err[0]*1.96,2)}, {np.around(beta[1],2)}+/-{np.around(beta_err[1]*1.96,2)}, n={y_pred.shape[0]}')
    xfill = np.array([-4.5,1.5])
    
    fig, ax = plt.subplots(figsize=(8, 3.5))
    ax.tick_params(labelsize=14)
    
    # LST data and regression
    plt.scatter(x_original, y_original, s=12,color='mediumslateblue')
    plt.plot(x_original, y_pred, color='mediumslateblue', label='LST ODR')
    plt.fill_between(xfill, linear_model(beta_bdn, xfill), linear_model(beta_mup, xfill),alpha=0.3, facecolor='0.3')
    
    # Comparison regressions
    xi = np.arange(-7.0,5.0,1.0)
    plt.plot(xi,xi * sstcalib_m + sstcalib_b,color='k',linewidth=2,label='SST ODR')
    plt.plot(xi,xi,color='lightcoral',linewidth=2, label='MODIS 1:1')
        
    plt.legend(loc='lower right',fontsize=14)
    plt.text(-2.7,-0.15,rf'$\mathbf{{y={np.around(beta[0],2)}x+{np.around(beta[1],2)}\quad r^2={np.around(R2,2)}}}$',color='mediumslateblue', fontweight='bold',fontsize=14)
    plt.xlim([-3.5,-1.2])
    plt.ylim([-3.05,0.8])
    # else: 
    #     plt.plot(x_original, y_pred, color='k', label='NLSST Orthogonal Distance Regression')
    #     plt.legend(loc='lower right',fontsize=12)
    #     plt.text(-2.6,-0.2,f'y={np.around(beta[0],2)}x+{np.around(beta[1],2)}   $r^2$={np.around(R2,2)}',fontsize=14)
    #     plt.xlim([-3.2,-0.15])
    #     plt.ylim([-2.4,0.9]) 
    plt.xlabel('Landsat ST [°C]',fontsize=16)
    plt.ylabel('MODIS SST [°C]',fontsize=16)
    plt.tight_layout()

# Orthoganal Regression for SST only
dataframes = [
    ('Combined', lsat_mod_df_n),
    ('Burke', lsat_mod_df_B),
    ('Cosgrove', lsat_mod_df_C),
    ('Dotson', lsat_mod_df_D),
]

if surf_temp=='LST':
    landsat_uncertainty = lst_uncertainty
else:
    landsat_uncertainty = sst_uncertainty

# Dictionary to store the results from each DataFrame
odr_results = {}

# Define a linear function for the model
def linear_model(p, x):
    return p[0] * x + p[1]

# Create a Model object
linear = Model(linear_model)

# Loop over each DataFrame
for df_name, data0 in dataframes:
    print(f"\n=== Processing {df_name} ===")
    
    # Original data
    x_original = np.array(data0[f'L8_{surf_temp}'])
    y_original = np.array(data0['MODIS_SST'])

    # Assume these are your uncertainty estimates per observation
    sy = np.full_like(y_original, modis_uncertainty * pix_uncertainty)  # Adjusted for resolution mismatch
    sx = np.full_like(x_original, landsat_uncertainty)
    
    # Create a RealData object using your DataFrame
    data = RealData(x_original, y_original,sx=sx, sy=sy)
    
    # Set up ODR with the model and data, providing an initial guess
    odr = ODR(data, linear, beta0=[1., 0.])
    
    # Run the regression
    out = odr.run()
    
    # Retrieve best-fit parameters and their std. dev.
    beta = out.beta
    beta_err = out.sd_beta
    
    # Print the summary
    out.pprint()
    
    # Predicting values using the ODR model
    y_pred = linear_model(beta, x_original)
    
    # Get R2
    # Calculate Total Sum of Squares (SST)
    y_mean = np.mean(y_original)
    SST = np.sum((y_original - y_mean)**2)
    
    # Calculate Residual Sum of Squares (SSR)
    SSR = np.sum((y_original - y_pred)**2)
    
    # Calculate R^2
    R2 = 1 - (SSR / SST)

    # Compute RMSE
    rmse = np.sqrt(((y_original - y_pred) ** 2).mean())
    
    # Print R^2
    print(f"{df_name} R^2:", np.around(R2, 2))
    print(f"RMSE: {np.around(rmse,2)}")
    
    # Store results in a dictionary for later use (plotting, etc.)
    odr_results[df_name] = {
        'beta': beta,
        'beta_err': beta_err,
        'R2': R2,
        'x_original': x_original,
        'y_original': y_original,
        'y_pred': y_pred
    }

# Import the necessary ipywidgets components
import ipywidgets as widgets
from ipywidgets import interact

# The @interact decorator will automatically create widgets for the function arguments
@interact(
    show_burke=widgets.Checkbox(value=True, description="Burke Data"),
    show_cosgrove=widgets.Checkbox(value=True, description="Cosgrove Data"),
    show_dotson=widgets.Checkbox(value=True, description="Dotson Data"),
    show_combined_fit=widgets.Checkbox(value=True, description="Combined Fit"),
    show_confidence_interval=widgets.Checkbox(value=True, description="95% CI")
)
def interactive_sst_plot(show_burke, show_cosgrove, show_dotson, show_combined_fit, show_confidence_interval):
    
    # This is your original code, with plotting commands wrapped in if-statements
    if surf_temp=='SST':
        # Your original data calculation code (unchanged)
        beta_mdn = [odr_results['Combined']['beta'][0]-odr_results['Combined']['beta_err'][0]*1.96,odr_results['Combined']['beta'][1]-odr_results['Combined']['beta_err'][1]*1.96]
        beta_mup = [odr_results['Combined']['beta'][0]+odr_results['Combined']['beta_err'][0]*1.96,odr_results['Combined']['beta'][1]-odr_results['Combined']['beta_err'][1]*1.96]
        beta_bdn = [odr_results['Combined']['beta'][0]-odr_results['Combined']['beta_err'][0]*1.96,odr_results['Combined']['beta'][1]+odr_results['Combined']['beta_err'][1]*1.96]
        beta_bup = [odr_results['Combined']['beta'][0]+odr_results['Combined']['beta_err'][0]*1.96,odr_results['Combined']['beta'][1]+odr_results['Combined']['beta_err'][1]*1.96]
        a1 = np.around(odr_results['Combined']['beta'][0],2)
        a2 = np.around(odr_results['Combined']['beta'][1],2)
        ar = np.around(odr_results['Combined']['R2'],2)
        b1 = np.around(odr_results['Burke']['beta'][0],2)
        b2 = np.around(odr_results['Burke']['beta'][1],2)
        br = np.around(odr_results['Burke']['R2'],2)
        c1 = np.around(odr_results['Cosgrove']['beta'][0],2)
        c2 = np.around(odr_results['Cosgrove']['beta'][1],2)
        cr = np.around(odr_results['Cosgrove']['R2'],2)
        d1 = np.around(odr_results['Dotson']['beta'][0],2)
        d2 = np.around(odr_results['Dotson']['beta'][1],2)
        dr = np.around(odr_results['Dotson']['R2'],2)
        xfill = np.array([-4.3,0.9])

        # --- Plotting Section ---
        fig, ax = plt.subplots(figsize=(8, 4)) # Increased height slightly for better text placement
        ax.tick_params(labelsize=14)

        if show_burke:
            plt.scatter(np.array(lsat_mod_df_B[f'L8_{surf_temp}']), np.array(lsat_mod_df_B['MODIS_SST']), s=12,color='#00bf7d', label='Burke')
            plt.plot(odr_results['Burke']['x_original'], odr_results['Burke']['y_pred'], ls='-',linewidth=1,color='#00bf7d')
            plt.text(-2.9,0.0,f'y={b1}x+{b2}   $r^2$={br}',color='#00bf7d',fontsize=12)

        if show_cosgrove:
            plt.scatter(np.array(lsat_mod_df_C[f'L8_{surf_temp}']), np.array(lsat_mod_df_C['MODIS_SST']), s=12,color=sns.color_palette("colorblind")[3], label='Cosgrove')
            plt.plot(odr_results['Cosgrove']['x_original'], odr_results['Cosgrove']['y_pred'], ls='-',linewidth=1,color=sns.color_palette("colorblind")[3])
            plt.text(-2.9,-0.3,f'y={c1}x+{c2}   $r^2$={cr}',color=sns.color_palette("colorblind")[3],fontsize=12)
            
        if show_dotson:
            plt.scatter(np.array(lsat_mod_df_D[f'L8_{surf_temp}']), np.array(lsat_mod_df_D['MODIS_SST']), s=12,color='#0073e6', label='Dotson')
            plt.plot(odr_results['Dotson']['x_original'], odr_results['Dotson']['y_pred'], ls='-',linewidth=1,color='#0073e6')
            plt.text(-2.9,-0.6,f'y={d1}x+{d2}   $r^2$={dr}',color='#0073e6',fontsize=12)

        if show_combined_fit:
            plt.plot(odr_results['Combined']['x_original'], odr_results['Combined']['y_pred'], color='k')
            plt.text(-2.9,0.3,f'y={a1}x+{a2}   $r^2$={ar}',color='k',fontsize=14)
            if show_confidence_interval:
                plt.fill_between(xfill, linear_model(beta_bdn, xfill), linear_model(beta_mup, xfill),alpha=0.3, facecolor='0.3')

        # General plot labels and limits (unchanged)
        plt.xlim([-3.05,-0.15])
        plt.ylim([-2.3,0.9])
        plt.xlabel('Landsat SST [°C]',fontsize=16)
        plt.ylabel('MODIS SST [°C]',fontsize=16)
        plt.legend(loc='lower right',fontsize=10)
        plt.tight_layout()
        plt.show() # Make sure to show the plot

odr_results['Combined']['beta_err']

if surf_temp=='SST':
    # Ordinary least squares regression between Landsat and MODIS SST matchups
    resultC = sm.ols(formula="MODIS_SST ~ L8_SST", data=data0).fit()
    print (resultC.summary())

Interactive Analysis Dashboard using Matplotlib¶

# Imports for Interactive Dashboard
import ipywidgets as widgets
from ipywidgets import interact, Layout

# Matplotlib-based Interactive Dashboard Function

def linear_model(p, x):
    return p[0] * x + p[1]

def interactive_calibration_dashboard_matplotlib(region, chart_type, pix_thresh, modis_sst_thresh, lsat_sst_thresh):
    # 1. & 2. Data filtering and selection (Same as your original code)
    df_C_filt = lsat_mod_df_C[(lsat_mod_df_C['MODIS_pix'] >= pix_thresh) & (lsat_mod_df_C['MODIS_SST'] >= modis_sst_thresh) & (lsat_mod_df_C[f'L8_{surf_temp}'] >= lsat_sst_thresh)]
    df_C_filt['Region'] = 'Cosgrove'

    df_D_filt = lsat_mod_df_D[(lsat_mod_df_D['MODIS_pix'] >= pix_thresh) & (lsat_mod_df_D['MODIS_SST'] >= modis_sst_thresh) & (lsat_mod_df_D[f'L8_{surf_temp}'] >= lsat_sst_thresh)]
    df_D_filt['Region'] = 'Dotson'

    df_B_filt = lsat_mod_df_B[(lsat_mod_df_B['MODIS_pix'] >= pix_thresh) & (lsat_mod_df_B['MODIS_SST'] >= modis_sst_thresh) & (lsat_mod_df_B[f'L8_{surf_temp}'] >= lsat_sst_thresh)]
    df_B_filt['Region'] = 'Burke'

    if region == 'All Regions':
        df_filtered = pd.concat([df_B_filt, df_C_filt, df_D_filt])
    elif region == 'Cosgrove':
        df_filtered = df_C_filt
    elif region == 'Dotson':
        df_filtered = df_D_filt
    elif region == 'Burke':
        df_filtered = df_B_filt
    
    # Create the figure and axis for the plot
    fig, ax = plt.subplots(figsize=(8, 6))

    if len(df_filtered) < 2:
        ax.text(0.5, 0.5, "Not enough data with the current filter settings.", ha='center', va='center')
        ax.set_axis_off()
        plt.show()
        return

    # 3. ODR Calculation (Same as your original code)
    x_data = df_filtered[f'L8_{surf_temp}'].values
    y_data = df_filtered['MODIS_SST'].values
    sx = np.full_like(x_data, sst_uncertainty)
    sy = np.full_like(y_data, modis_uncertainty * pix_uncertainty)
    model = Model(linear_model)
    data = RealData(x_data, y_data, sx=sx, sy=sy)
    odr = ODR(data, model, beta0=[1., 0.])
    output = odr.run()
    slope, intercept = output.beta
    y_pred = linear_model(output.beta, x_data)

    # 4. Statistics Calculation (Same as your original code, but formatted for Matplotlib title)
    bias = np.mean(y_data - x_data)
    trend = slope
    ss_total = np.sum((y_data - np.mean(y_data))**2)
    ss_resid = np.sum((y_data - y_pred)**2)
    r2 = 1 - (ss_resid / ss_total) if ss_total > 0 else 0
    rmse = np.sqrt(np.mean((y_data - y_pred)**2))
    # Note: Matplotlib title doesn't support HTML bold tags, so they are removed.
    stats_text = (f"Trend (Slope): {trend:.2f} | Bias (MODIS - Landsat): {bias:.2f}°C\n"
                  f"R²: {r2:.2f} | RMSE: {rmse:.2f} | N: {len(df_filtered)}")

    # 5. Create Matplotlib figure based on chart_type
    if chart_type == 'Scatter':
        # Seaborn's scatterplot is a great way to handle coloring by category easily
        sns.scatterplot(data=df_filtered, x=f'L8_{surf_temp}', y='MODIS_SST', hue='Region', ax=ax, s=50)
    elif chart_type == 'Heatmap':
        ax.hist2d(df_filtered[f'L8_{surf_temp}'], df_filtered['MODIS_SST'], bins=20, cmap='viridis')
        fig.colorbar(ax.collections[0], ax=ax, label='Point Density')

    # Add regression line
    sorted_indices = np.argsort(x_data)
    ax.plot(x_data[sorted_indices], y_pred[sorted_indices], color='black', linewidth=2, label='ODR Fit')

    # Set titles, labels, and limits
    ax.set_title(stats_text)
    ax.set_xlabel("Landsat SST [°C]")
    ax.set_ylabel("MODIS SST [°C]")
    ax.set_xlim(-3.05, -0.15)
    ax.set_ylim(-2.3, 0.9)
    ax.legend()
    ax.grid(True, linestyle='--', alpha=0.6)
    
    plt.show()

# Define Widgets
style = {'description_width': 'initial'}
w_region = widgets.Dropdown(options=['All Regions', 'Cosgrove', 'Dotson', 'Burke'], value='All Regions', description='Select Region:', style=style)
w_chart_type = widgets.ToggleButtons(options=['Scatter', 'Heatmap'], description='Chart Type:', button_style='info')
w_pix_thresh = widgets.IntSlider(value=1300, min=0, max=5000, step=100, description='Min MODIS Pixels:', style=style, layout=Layout(width='500px'))
w_modis_sst = widgets.FloatSlider(value=-1.9, min=-2.5, max=0, step=0.1, description='Min MODIS SST (°C):', style=style, layout=Layout(width='500px'))
w_lsat_sst = widgets.FloatSlider(value=-3.1, min=-4.0, max=0, step=0.1, description='Min Landsat SST (°C):', style=style, layout=Layout(width='500px'))


# Launch the new Matplotlib-based Dashboard
interact(interactive_calibration_dashboard_matplotlib, 
         region=w_region, 
         chart_type=w_chart_type,
         pix_thresh=w_pix_thresh, 
         modis_sst_thresh=w_modis_sst, 
         lsat_sst_thresh=w_lsat_sst);

Preamble

Preprocessing data

Preamble

Validating your data product