Analyzing Salinity Patterns in South Florida

This notebook demonstrates how to analyze salinity patterns in South Florida using curated buoy data processed by the EnviStor smart data pipeline and made available through PelicanFS, a high-performance file system interface for the Open Science Data Federation (OSDF).

The data used here comes from three monitoring stations managed by FIU, each located in a distinct part of South Florida’s coastal waters. These datasets were curated and made analysis-ready by the EnviStor pipeline.

The central question is:

What are the salinity patterns in South Florida?

We’ll answer this by loading the data, cleaning and merging it, and visualizing salinity trends over time across multiple locations.

Imports¶

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from pelicanfs.core import OSDFFileSystem
from io import BytesIO

sns.set(style="whitegrid")

Load the Curated Salinity Datasets¶

We’ll load data from three buoy stations directly from PelicanFS using the OSDFFileSystem() interface. These Excel files have already been curated and prepared by the EnviStor smart pipeline, and are stored in the /envistor namespace in OSDF.

Each file corresponds to a different buoy location:

Biscayne Bay
Haulover Inlet
Little River

We use BytesIO to read the content as a stream before passing it to pandas.read_excel(). Each resulting DataFrame includes a "Station" column to identify its source location.

pelfs = OSDFFileSystem()
file_buoy1 = pelfs.cat('/envistor/CREST_Buoy_2_NW_Biscayne_Bay_-_S_of_Biscayne_Canal_082720-112221.xlsx')
file_buoy2 = pelfs.cat('/envistor/CREST_Buoy_3_Haulover_Inlet_100518_-_073020_updated.xlsx')
file_buoy3 = pelfs.cat('/envistor/CREST_Buoy_3-2_Little_River_042121-050624.xlsx')

excel_file1 = BytesIO(file_buoy1)
df_file_buoy1 = pd.read_excel(excel_file1)
df_file_buoy1['Station'] = 'Buoy - Biscayne Bay'

excel_file2 = BytesIO(file_buoy2)
df_file_buoy2 = pd.read_excel(excel_file2)
df_file_buoy2['Station'] = 'Buoy - Haulover Inlet'

excel_file3 = BytesIO(file_buoy3)
df_file_buoy3 = pd.read_excel(excel_file3)
df_file_buoy3['Station'] = 'Little River'

No working cache found

---------------------------------------------------------------------------
NoAvailableSource                         Traceback (most recent call last)
Cell In[2], line 2
      1 pelfs = OSDFFileSystem()
----> 2 file_buoy1 = pelfs.cat('/envistor/CREST_Buoy_2_NW_Biscayne_Bay_-_S_of_Biscayne_Canal_082720-112221.xlsx')
      3 file_buoy2 = pelfs.cat('/envistor/CREST_Buoy_3_Haulover_Inlet_100518_-_073020_updated.xlsx')
      4 file_buoy3 = pelfs.cat('/envistor/CREST_Buoy_3-2_Little_River_042121-050624.xlsx')

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py:118, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
    115 @functools.wraps(func)
    116 def wrapper(*args, **kwargs):
    117     self = obj or args[0]
--> 118     return sync(self.loop, func, *args, **kwargs)

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs)
    101     raise FSTimeoutError from return_result
    102 elif isinstance(return_result, BaseException):
--> 103     raise return_result
    104 else:
    105     return return_result

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout)
     54     coro = asyncio.wait_for(coro, timeout=timeout)
     55 try:
---> 56     result[0] = await coro
     57 except Exception as ex:
     58     result[0] = ex

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/pelicanfs/core.py:980, in PelicanFileSystem._cache_multi_dec.<locals>.wrapper(self, *args, **kwargs)
    978     data_url, director_response = await self.get_origin_url(path)
    979 else:
--> 980     data_url, director_response = await self.get_working_cache(path)
    982 # Handle token generation if required (single path)
    983 operation = self._get_token_operation(func.__name__)

File ~/micromamba/envs/osdf-cookbook/lib/python3.12/site-packages/pelicanfs/core.py:546, in PelicanFileSystem.get_working_cache(self, fileloc)
    544 if not cache_list:
    545     logger.error("No working cache found")
--> 546     raise NoAvailableSource()
    548 with self._namespace_lock:
    549     self._namespace_cache[namespace] = _CacheManager(cache_list, director_response)

NoAvailableSource:

Clean and Combine the Data¶

Now that we’ve loaded the individual DataFrames, we’ll prepare them for analysis.

Here’s what we do:

Create a timestamp: Combine the "Date" and "Time" columns into a single datetime column.
Standardize salinity: Rename the "Sal_psu" column to "Salinity" and convert its values to numeric (in case of string or error-prone entries).
Merge datasets: Concatenate the three cleaned DataFrames into one (df_all) so we can analyze salinity trends across all buoy stations together. We also drop any rows with missing salinity values and set the datetime column as the index to enable time-based operations later.

for df in [df_file_buoy1, df_file_buoy2, df_file_buoy3]:
    df["datetime"] = pd.to_datetime(df["Date"].astype(str) + " " + df["Time"].astype(str))

    df.rename(columns={"Sal_psu": "Salinity"}, inplace=True)
    df["Salinity"] = pd.to_numeric(df["Salinity"], errors="coerce")

df_all = pd.concat([df_file_buoy1, df_file_buoy2, df_file_buoy3], ignore_index=True)
df_all.dropna(subset=["Salinity"], inplace=True)
df_all.set_index("datetime", inplace=True)

Resample and Aggregate¶

Salinity readings are taken multiple times per day. To reveal broader trends, we resample the data to daily averages.

This step:

Reduces noise
Makes it easier to compare across time
Prepares the data for visualization

We group by station and resample by day ('1D').

df_daily = (
    df_all
    .groupby("Station")
    .resample("1D")
    ["Salinity"]
    .mean()
    .reset_index()
)

Visualize the Salinity Patterns¶

Now we can plot daily salinity patterns to compare how they evolve over time across the three locations.

We’ll use Seaborn for a clean, readable line chart.

plt.figure(figsize=(14, 6))
sns.lineplot(data=df_daily, x="datetime", y="Salinity", hue="Station")
plt.title("Daily Average Salinity in South Florida (by Buoy Station)")
plt.xlabel("Date")
plt.ylabel("Salinity (psu)")
plt.legend(title="Station")
plt.tight_layout()
plt.show()

Interpret the Results¶

With the visualization in hand, we can start identifying key salinity patterns in South Florida:

Biscayne Bay (blue) shows a clear seasonal fluctuation, with higher salinity in dry months and noticeable drops likely linked to storm events or freshwater inflow.
Haulover Inlet (orange) tends to have consistently higher salinity levels, suggesting stronger tidal mixing and less influence from freshwater discharge.
Little River (green) displays the most variability — sharp dips and spikes in salinity hint at frequent freshwater input, possibly from canals, rain events, or upstream runoff.

These differences illustrate how geographic location and local hydrology impact salinity levels. By comparing trends across stations, we gain insight into how dynamic and localized coastal salinity can be.

Next Steps¶

This notebook provided a foundation for analyzing coastal salinity patterns using curated data from the EnviStor pipeline. If you’re interested in extending this work, here are a few ideas:

Incorporate other environmental variables: Analyze how temperature, turbidity, or dissolved oxygen vary alongside salinity to build a more holistic view of water quality.
Add spatial analysis: Use GIS tools or libraries (e.g., Cartopy or Folium) to visualize station locations and explore spatial gradients in salinity.
Compare across years: Investigate long-term salinity trends and identify anomalies across different seasons or years.
Include other datasets from EnviStor: Expand the workflow by pulling additional datasets from OSDF, such as ocean currents, rainfall, or metadata-enriched observations.
Develop alert thresholds: Identify salinity levels that may signal ecological stress or risk, potentially integrating this with decision-making tools.