Skip to article frontmatterSkip to article content

Buoys and Wave(let)s

Animation for a deep water wave

Buoys and Wave(let)s


Overview

Generating a wavelet power spectrum from the time-series data of wave heights collected from buoy station 44013 east of Boston

  1. Prerequisties
  2. Background
  3. Download and Organize Buoy Data
  4. Wavelet Input Values
  5. PyWavelets
  6. Power Spectrum

Prerequisites

ConceptsImportanceNotes
Intro to MatplotlibNecessaryUsed to plot data
Intro to NumpyNecessaryUsed to work with large arrays
Intro to DatetimesHelpfulUseful to understand strings vs. datetimes
  • Time to learn: 35 minutes

Background

Buoys float on the surface of water, and while they can serve many purposes, research facilities like NOAA’s National Data Buoy Center provide real-time and historical data about waves and climate on the ocean and within lakes from hundreds of buoys around the world.

buoy-world-map

Each buoy has been in service for a variety of years and collects different information. Buoy 44013 is off the coast of Boston in the Massachusetts Bay and reports information about the wind speed and direction, atmospheric pressure, air temperature, dew point, water temperature, and wave height throughout the year.

LocationBuoy
buoy-stationbuoy-example

44013 is owned and operated by the National Data Buoy Center and measures a depth of 64.6 meters and has a watch circle radius of 122 yards, with standard meterological data dating back to 1984.

Example Buoy Camera Photos from April 7, 2025:

buoy-camera

And a photo from today:

buoy-camera

For this notebook, we will be investigating the wave height of the buoy across two years in the early 2000’s

Imports

import numpy as np                          # working with arrays
import matplotlib.pyplot as plt             # plot data
import datetime                             # converting strings to datetime objects
import requests                             # retrieve data from text on a website

import pywt                                 # PyWavelets

Access Buoy Data

First, we will need to download the relevant data from NOAA National Data Buoy Center. For this notebook, we will be investigating wave height data from 2001 to 2002.

The buoy contains more information than just wave height, so we will save only data relevant to the date and time and the wave height. So, the columns we need to keep are date, time (YY, MM, DD, hh), and wave height (WVHT).

We will iterate through the years to save and concatenate the data into a single large array.

years = [2001, 2002]
date_times = np.array([])
wave_height = np.array([])
for i, year in enumerate(years):

    # collect data from NOAA Buoy link
    data_link = f"https://www.ndbc.noaa.gov/view_text_file.php?filename=41001h{year}.txt.gz&dir=data/historical/stdmet/"
    data_req = requests.get(data_link)
    data_txt = data_req.text
    data = np.genfromtxt(data_txt.splitlines(), comments=None, dtype='str')

    # Find the index for datetime and wave height information
    datetime_index = np.where(data[0] == "hh")[0][0]
    wave_height_index = np.where(data[0] == "WVHT")[0][0]
    
    # collect data from all rows for the columns of datetime and wave height
    # [1:] skips the first row with header information
    date_time = data[:,:datetime_index+1][1:]
    data_wave_height = data[:,wave_height_index][1:] 

    # Converts the string data collected from the source to a float
    data_wave_height = data_wave_height.astype(float)

    # Concatenate all data into a single array
    if i == 0:
        date_times = date_time
        wave_height = data_wave_height
    else:
        date_times = np.concatenate([date_times, date_time])
        wave_height = np.concatenate([wave_height, data_wave_height])

Clean Up Wave Height data

Buoys stores null as 99.00, so to remove this outlier, we will need to replace 99.00 with np.nan.

print(f"Max wave height before applying nan = {max(wave_height)} m")
wave_height = np.where(wave_height == 99.00, np.nan, wave_height)
print(f"Max wave height after applying nan  = {max(wave_height)} m")
Max wave height before applying nan = 99.0 m
Max wave height after applying nan  = 9.38 m

Convert Time to Datetimes

Buoys store datetime information as separate columns of data with YYYY MM DD and HH. For Python to recognize these strings as a date, we need to combine the strings into a single datetime value.

dates = []
for date in date_times:
    # combine each separate column into a single string: YYYYMMDDHH
    date_string = "".join(date)
    # convert string to a datetime object
    dates.append(datetime.datetime.strptime(date_string, "%Y%m%d%H"))
dates = np.array(dates)
print(f"First Date = {dates[0]}")
print(f"Last Date  = {dates[-1]}")
First Date = 2001-01-01 00:00:00
Last Date  = 2002-12-17 20:00:00

Plot and View Data

Let’s give the data a look! Wave height is recorded in meters throughout the year.

fig, ax = plt.subplots(figsize=(10, 10))
plt.title("Wave Height on Station 44013")
plt.xlabel("Date")
plt.xticks(rotation=90)
plt.ylabel("Wave Height (meters)")
plt.plot(dates, wave_height)
plt.show()
<Figure size 1000x1000 with 1 Axes>

Wavelet Input Values

Wavelet inputs include:

  • x: Input time-series data (for example, the time and wave height of data)
  • wavelet: mother wavelet name
  • dt: sampling period (time between each y-value) for time range
  • s0: smallest scale
  • dj: spacing between each discrete scales
  • jtot: largest scale
dt = 3600   # sampling period (time between each y-value) ->  seconds in an hour
s0 = 0.25   # smallest scale
dj = 0.25   # spacing between each discrete scales
jtot = 64   # largest scale

For this example, we will be using a complex Morlet with a bandwidth of 1.5 and a center frequency of 1.

bandwidth = 1.5
center_freq = 1
wavelet_mother = f"cmor{bandwidth}-{center_freq}"
print(wavelet_mother)
cmor1.5-1

Applying Wavelets

scales = np.arange(1, jtot + 1, dj)
wavelet_coeffs, freqs = pywt.cwt(
    data=wave_height, scales=scales, wavelet=wavelet_mother, sampling_period=dt
)

Power Spectrum

The power spectrum is the real component of the wavelet coefficients. We can find this value by squaring the absolute value of the wavelet_coeffs to return the magnitude of the real component to make a better graph.

power = np.power((abs(wavelet_coeffs)), 2)

fig, ax = plt.subplots(figsize=(10, 10))

# Plot Scalogram
plt.imshow(
    power, vmax=(power).max(), vmin=(power).min(), aspect="auto"
)

# Plot X-Axis with Date Labels and Ticks Monthly
num = 730 # hours in a month
x_ticks = range(0, len(dates), num)
x_ticklabels = []
for date in dates[::num]:
    x_ticklabels.append(str(date)[:-9])
plt.xticks(ticks=x_ticks, labels=x_ticklabels,rotation=90)

plt.title("Buoy Height at Station 44013")
plt.xlabel("Date (Hours)")
plt.ylabel("Scale")
plt.colorbar()
plt.show()
<Figure size 1000x1000 with 2 Axes>

What Went Wrong?

The current wavelet has a lot of blank space that is confusing the wavelet analysis. This is the result of the nan values, where data was no included, but is still being used in the calculation. So, to get a complete wavelet, we should filter out the nan values before we apply a wavelet.

We will fix this by applying a mask to all the nan values, and then interpolating between to fill out the remaining data.

wave_len = np.arange(len(wave_height))
mask = np.isfinite(wave_height)
xfiltered = np.interp(wave_len, wave_len[mask], wave_height[mask])

Time to get our wavelet_coeffs from the newly filtered data.

scales = np.arange(1, jtot + 1, dj)
wavelet_coeffs, freqs = pywt.cwt(
    data=xfiltered, scales=scales, wavelet=wavelet_mother, sampling_period=dt
)

Power Spectrum, Take Two

power = np.power((abs(wavelet_coeffs)), 2)

fig, ax = plt.subplots(figsize=(10, 10))

# Plot Scalogram
plt.imshow(
    power, vmax=(power).max(), vmin=(power).min(), aspect="auto"
)

# Plot X-Axis with Date Labels and Ticks Monthly
num = 730 # hours in a month
x_ticks = range(0, len(dates), num)
x_ticklabels = []
for date in dates[::num]:
    x_ticklabels.append(str(date)[:-9])
plt.xticks(ticks=x_ticks, labels=x_ticklabels,rotation=90)

plt.title("Buoy Height at Station 44013")
plt.xlabel("Date (Hours)")
plt.ylabel("Scale")
plt.colorbar()
plt.show()
<Figure size 1000x1000 with 2 Axes>

That’s more like it!

Now, the x-axis describes the when and the y-axis describes the what

  • X axis describes when wind is present and causing waves
  • Y axis describes the range of wave periods needed to make the signal that is present. There is a large range when waves are detected, which is expected since wind excites many frequencies of waves on the surface when it blows

Conclusions

The power spectrum above demonstrates a strong peak (in yellow) within the first month of the year. During this time of year, the wave activity is high and across a wide range of frequencies. This wavelet is demonstrating when wave activity is highest and the range of wave heights is widest.


Summary

Wavelet analysis can be applied across a wide range of data, as long as there is a time-series data. Buoys returns a large range of possible data, but for this example we explored how wave heights vary throughout the year. There is a large amount of wave activity in very short intervals at the beginning of the year, but the wave heights vary over a large range of possible frequencies.

What’s next

Resources and references