Skip to article frontmatterSkip to article content

Visualizing Data with EPA’s Air Quality System (AQS) API

Wildfire smoke over New York City (June 7, 2023) Photo by Ahmer Kalam on Unsplash

Visualizing Data with EPA’s Air Quality System (AQS) API


Overview

Air quality data are an important aspect of both atmospheric and environmental sciences. Understanding the concentrations of particulate matter and chemical species (e.g., O3 and NOx) can be useful for air pollution analysis from both the physical science and health science perspectives.

The US EPA AQS has archived data that have gone through quality assurance.

In this notebook, we will cover:

  1. Accessing data from the AQS
  2. Exploring the format of the data
  3. Preparing the data for visualization
  4. Generating a timeseries plot

Prerequisites

ConceptsImportanceNotes
Introduction to PandasNecessaryHow to deal with dataframes and datasets
Matplotlib BasicsHelpfulSkills for different plotting styles and techniques
  • Time to learn: 30 minutes
  • System requirements:
    • Email address for AQS API access

Imports

Info

Here we'll import lots of stuff, but we might not end up using them all...

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from datetime import date
from datetime import datetime
import numpy as np
import os
import pyaqsapi as aqs

We will also set some limits to the size of data that Pandas displays, so as not to overload our screens.

# Set the maximum number of rows and columns to display
pd.set_option('display.max_rows', 10)  # Set to the number of rows you want to display
pd.set_option('display.max_columns', 10)  # Set to the number of columns you want to display

Accessing Data from the AQS

Important:

If you have previously registered an account with the AQS, now will be a good time to get that information out and skip past the `aqs.aqs_sign_up()` step below.

If not, you should have an email address in mind that you'd like to use.

Register a new email with aqs_sign_up()

In the cell below, uncomment the code and replace ‘EMAIL’ with an email address to use for API credentials.

# aqs.aqs_sign_up('EMAIL')

IMPORTANT

Replace your email address with 'EMAIL' after you've run `aqs_sign_up()`, or comment out the line!

A new API key will be generated every time that line of code is executed!

Data can be pulled from the AQS in a number of different ways...

  1. By Sample Site
  2. By County
  3. By State
  4. By Lat/Lon Box
  5. By Monitoring Agency
  6. By Primary Quality Assurance Organization
  7. By Core Based Statistical Aera (as defined by the US Census Bureau)

Let’s look at how the package deals with states...

# aqs.aqs_states()

Whoops!

You need to input your credentials before any of the functions will work!

Use the aqs_credentials() function to input your username (email address) and access key.

This is all found in the email you received when verifying your email address.

If you’ve previously registered your address and do not have the key, you can simply generate a new key by using the aqs_sign_up() funtion to resubmit your email address.

Let’s also save our username and key as variables that we can easily call later.

  • Comment out the first line and Uncomment back in the second line in the cell below.
  • Replace ‘AQS_USERNAME’ and ‘AQS_KEY’ with your credentials. We stored them as environment variables, to ensure they are kept secret while building this notebook.
aqs.aqs_credentials(username= os.getenv('AQS_USERNAME'), key= os.getenv('AQS_KEY'))
#aqs.aqs_credentials(username='AQS_USERNAME', key='AQS_KEY')

MAKE SURE TO CLEAR VARIABLES!

Make sure to clear these variables before submitting a pull request! You do not want to share your credentials!

Let’s look at those states now...

aqs.aqs_states()
Loading...

Since states will be input via a number, let’s store aqs_states() as a variable that we can call on later to remind ourselves of what states we need.

Let’s assume for now that we want to focus on New York and also save that code as variable.

states = aqs.aqs_states()
NY = 36
/home/runner/micromamba/envs/api-cookbook-dev/lib/python3.13/site-packages/pyaqsapi/helperfunctions.py:325: UserWarning: AQSDataMArt returned the following message: 
['variable is missing or the value is empty: email', 'email address: , requires the following format: local-part@domain.']
Perhaps you've entered an incorrect username and/or key to the aqs_credentials function?
Here is the 
username: 
and
key:  
that was provided
  warn(

Everything Is Currently Input Numerically

It's important that we also address the fact that, currently, everything is input as a numerical value for pulling these data from the AQS with this Python package.

Parameter Codes can be accessed from the EPA here, but to simplify things here are codes for a few common pollutants with defined Air Quality Index values you might be looking for...

PollutantParameter Code
Carbon Monoxide (CO)42101
Nitrogen Dioxide (NO2)42602
Ozone (O3)44201
PM 10 (Total)81102

Other variables that might be of interest:

METParameter Code
Wind Speed - Resultant (knots)61103
Wind Direction - Resultant (deg)61104
Outdoor Temperature (F)62101
Average Ambient Temperature (C)68105
Relative Humidity (%)62201
Barometric Pressure (mbar)64101

Let’s also store the parameter codes as variables to make things more simple.

CO = 42101
NO2 = 42602
O3 = 44201
PM10 = 81102

Exploring the format of the data

Let’s look at current O3 data for New York State.

now = datetime.today()
year = now.year
month = now.month
day = now.day
print(year, month, day)
2025 6 18

Warning

The AQS does not have real-time data. Also, note that the above times are in UTC!

We’ll subtract one day so we have the past day of data.

ozone = aqs.bystate.sampledata(parameter= O3, bdate = date(year=year, month=month, day = day-1), edate = date(year=year, month=month, day = day), stateFIPS=NY)
/home/runner/micromamba/envs/api-cookbook-dev/lib/python3.13/site-packages/pyaqsapi/helperfunctions.py:325: UserWarning: AQSDataMArt returned the following message: 
['variable is missing or the value is empty: email', 'email address: , requires the following format: local-part@domain.']
Perhaps you've entered an incorrect username and/or key to the aqs_credentials function?
Here is the 
username: 
and
key:  
that was provided
  warn(
ozone
Loading...

Oh, no!

Looks like there isn't current O3 data available. We must go even further back in time.

ozone = aqs.bystate.sampledata(parameter= O3, bdate = date(year=year-1, month=month, day = day-1), edate = date(year=year-1, month=month, day = day), stateFIPS=NY)
/home/runner/micromamba/envs/api-cookbook-dev/lib/python3.13/site-packages/pyaqsapi/helperfunctions.py:325: UserWarning: AQSDataMArt returned the following message: 
['variable is missing or the value is empty: email', 'email address: , requires the following format: local-part@domain.']
Perhaps you've entered an incorrect username and/or key to the aqs_credentials function?
Here is the 
username: 
and
key:  
that was provided
  warn(
#ozone

Great! Now we have some data. Let’s look at the columns.

ozone.columns
RangeIndex(start=0, stop=0, step=1)

There is a lot of information in this dataset.

  1. Geospatial Information

  2. Temporal Information

  3. Sample Information

  4. Data QA Information

We’ll focus on a few from 1, 2, and 3: Latitude and Longitude can be used to plot these data over a map, which will be addressed in Notebook 3 of this Cookbook Local Date and Time , as well as State , County , and Site Number can be used as to isolate data for a time series. Units of Measure will be necessary for annotations and labels.

PAUSE

We've seen how to pull the data, and we've seen that pulling current data is not possible due to the lag between sample time and quality-assurance checks.

You are encouraged to check for more current data but, for now, let's look at this month's data from last year.

start = 1
end = 30 #(Replace this value to match the appropriate "last day" of the month you are running this notebook)
ozone = aqs.bystate.sampledata(parameter= O3, bdate = date(year=year-1, month=month, day = start), edate = date(year=year-1, month=month, day = end), stateFIPS=NY)
#ozone
/home/runner/micromamba/envs/api-cookbook-dev/lib/python3.13/site-packages/pyaqsapi/helperfunctions.py:325: UserWarning: AQSDataMArt returned the following message: 
['variable is missing or the value is empty: email', 'email address: , requires the following format: local-part@domain.']
Perhaps you've entered an incorrect username and/or key to the aqs_credentials function?
Here is the 
username: 
and
key:  
that was provided
  warn(

A quick check at plotting the data in its original format shows that some polishing is necessary.

plt.plot(ozone['date_local'], ozone['sample_measurement'], '.')
plt.show()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[17], line 1
----> 1 plt.plot(ozone['date_local'], ozone['sample_measurement'], '.')
      2 plt.show()

File ~/micromamba/envs/api-cookbook-dev/lib/python3.13/site-packages/pandas/core/frame.py:4107, in DataFrame.__getitem__(self, key)
   4105 if self.columns.nlevels > 1:
   4106     return self._getitem_multilevel(key)
-> 4107 indexer = self.columns.get_loc(key)
   4108 if is_integer(indexer):
   4109     indexer = [indexer]

File ~/micromamba/envs/api-cookbook-dev/lib/python3.13/site-packages/pandas/core/indexes/range.py:417, in RangeIndex.get_loc(self, key)
    415         raise KeyError(key) from err
    416 if isinstance(key, Hashable):
--> 417     raise KeyError(key)
    418 self._check_indexing_error(key)
    419 raise KeyError(key)

KeyError: 'date_local'

Spatiotemporal Issues Abound!

It looks like the primary hiccups in trying to plot these data are the fact that the DataFrame has a separate column for date and time, and that there are multiple sample sites across the DataFrame.

We can combine the dates and times into a single datetime value.

We can also specify the sample sites we want to look at, or take averages across a county or the whole state.

Either way, we'll need to prepare the data.


Preparing the data for visualization

Let’s utilize some Pandas features to generate a more manageable DataFrame for plotting.

First, let’s select only one specific county--Albany

Warning

Not every county samples every type of pollutant!

O3alb = ozone.loc[ozone['county'] == 'Albany', ['date_local', 'time_local', 'sample_measurement', 'units_of_measure', 'site_number', 'latitude', 'longitude']]
#O3alb

Warning

Something is a bit off, here. We see that the datetimes are mostly, but not entirely in order.

OR... you see that there is no data at all. This is the result of data being sporatic over space and time

Let’s make sure our new DataFrame for Albany is properly chronological.

O3alb['datetime'] = pd.to_datetime(O3alb['date_local'] + ' ' + O3alb['time_local'])
O3alb = O3alb.sort_values(by='datetime')
#O3alb

Now we should be able to plot a basic ozone time series for Albany, NY that covers this month for last year.


Generating a time series plot

Let’s quickly test a lineplot of our data using seaborn.

sns.lineplot(x="datetime", y="sample_measurement", data=O3alb)
plt.show()

Success

We have a time series! Now let's polish it up a bit...

# Design the figure:

# Figure shape
fig, ax = plt.subplots(figsize = (10,5))

# Give it a title
plt.title((f'Ozone Concentration Albany, NY - {month}/{year-1}'), fontsize = 20)

# Plot the data
sns.lineplot(x="datetime", y="sample_measurement", data=O3alb, ax=ax)

# Title the axes
ax.set_xlabel('Date', labelpad = 20, fontsize = 16)
ax.set_ylabel('[O$_{3}$] (ppm)', labelpad = 20, fontsize = 16)

    # For the X-Axis

# Set major x-ticks for midnight (00h)
x_major = O3alb['datetime'][O3alb['datetime'].dt.hour == 0]
ax.set_xticks(x_major)
print(O3alb['datetime'].min())
print(O3alb['datetime'].max())
# Set minor ticks at every 6 hours
x_minor = pd.date_range(start= O3alb['datetime'].min(), end= O3alb['datetime'].max(), freq='6h')
ax.set_xticks(x_minor, minor=True)

# Clean up the date label so it doesn't show the year or minutes
formatted_labels = [x.strftime('%m-%d %H') + 'h' for x in x_major]
ax.set_xticklabels(formatted_labels, rotation=90, fontsize = 14)

    # For the Y-Axis

# Add minor ticks to the y-axis
y_minor = np.arange(0, 0.06, 0.002)
ax.set_yticks(y_minor, minor = True)

# Fix the fontsize for the y-tick labels
ax.tick_params(axis='y', labelsize=14)

plt.show()

We can also add a new data points to compare (e.g., NO2)

NO2_data = aqs.bystate.sampledata(parameter= NO2, bdate = date(year=year-1, month=month, day = start), edate = date(year=year-1, month=month, day = end), stateFIPS=NY)
NO2_data['county'].unique()

ALBANY?!

Albany doesn't seem to display data for NO2.

You're welcome to check on your own for another pollutant to compare, but we'll shift to another county that covers both O3 and NO2.

ozone['county'].unique()

The Bronx seems like a good county to work with for the purposes of this Notebook.

bronxNO2 = NO2_data.loc[NO2_data['county'] == 'Bronx', ['date_local', 'time_local', 'sample_measurement', 'units_of_measure', 'site_number', 'latitude', 'longitude']]
bronxO3 = ozone.loc[ozone['county'] == 'Bronx', ['date_local', 'time_local', 'sample_measurement', 'units_of_measure', 'site_number', 'latitude', 'longitude']]
print('O3: ', bronxO3.iloc[0,:], '\n')
print('NO2: ', bronxNO2.iloc[0,:])

We’ve seen that both data are present for Bronx County.

Again, let’s make sure our new DataFrames are properly chronological.

bronxO3['datetime'] = pd.to_datetime(bronxO3['date_local'] + ' ' + bronxO3['time_local'])
bronxO3 = bronxO3.sort_values(by='datetime')

bronxNO2['datetime'] = pd.to_datetime(bronxNO2['date_local'] + ' ' + bronxNO2['time_local'])
bronxNO2 = bronxNO2.sort_values(by='datetime')

Now we’ll plot both data together.

# Design the figure:   
fig, ax = plt.subplots(figsize = (10,5))

shared_colors = sns.color_palette('muted')

# Create a secondary y-axis
ax2 = ax.twinx()

# Give it a title
plt.title((f'Air Pollution Concentration Bronx County, NY - {month}/{year-1}'), fontsize = 20)

# Plot the data for O3
sns.lineplot(x="datetime", y="sample_measurement", data=bronxO3, label = 'O$_{3}$', ax = ax, legend = False, color= shared_colors[0],)

# Plot NO2 data alongside O3 data
sns.lineplot(x="datetime", y="sample_measurement", data=bronxNO2, label='NO$_{2}$', ax = ax2, legend = False, color= shared_colors[1],)


    # For the X-Axis:

# Title the x-axis
ax.set_xlabel('Date', labelpad = 20, fontsize = 16)

# Set major x-ticks for midnight (00h)
x_major = bronxO3['datetime'][bronxO3['datetime'].dt.hour == 0]
ax.set_xticks(x_major)

# Set minor ticks at every 6 hours
x_minor = pd.date_range(start= bronxO3['datetime'].min(), end= bronxO3['datetime'].max(), freq='6h')
ax.set_xticks(x_minor, minor=True)

# Clean up the date label so it doesn't show the year or minutes
formatted_labels = [x.strftime('%m-%d %H') + 'h' for x in x_major]
ax.set_xticklabels(formatted_labels, rotation=90, fontsize = 14)

    # For the Y-Axes:

# Add titles to both y-axes
ax.set_ylabel('[O$_{3}$] (ppm)', labelpad = 25, fontsize = 16)
ax2.set_ylabel('[NO$_{2}$] (ppm)', labelpad=25, fontsize=16, rotation = -90)

# Add minor ticks to the y-axis for O3
y_minor = np.arange(0, 0.06, 0.002)
ax.set_yticks(y_minor, minor = True)

# Fix the fontsize for the y-tick labels
ax.tick_params(axis='y', labelsize=14)

# Add minor ticks to the secondary y-axis for NO2
y2_minor = np.arange(0, 55, 2)
ax2.set_yticks(y2_minor, minor=True)

# Match fontsize for secondary y-axis
ax2.tick_params(axis='y', labelsize=14)

    # Design the legend:

# Combine both plot labels into a single legend
handles, labels = ax.get_legend_handles_labels()
handles2, labels2 = ax2.get_legend_handles_labels()

plt.legend(handles + handles2, labels + labels2, loc='best', fontsize=18)


plt.show()

Success

We've been able to pull data from the AQS and plot a time series!


Let’s take a quick look at how you can pull data based on a areal extent, and then we’ll close out this notebook.

Let’s start with the CONUS

lonW = '-130'
lonE = '-62'
latS = '20'
latN = '55'
CONUS = aqs.bybox.sampledata(parameter= O3, bdate = date(year=year-1, month=month, day = day-1), 
                           edate = date(year=year-1, month=month, day = day), 
                           minlat = latS, maxlat = latN, minlon = lonW, maxlon = lonE)
#CONUS

We can also use the extent parameters for New York State.

NY_E = -71
NY_W = -81.0
NY_N = 46
NY_S = 40
NYS = aqs.bybox.sampledata(parameter= O3, bdate = date(year=year-1, month=month, day = day-1), 
                           edate = date(year=year-1, month=month, day = day), 
                           minlat = NY_S, maxlat = NY_N, minlon = NY_W, maxlon = NY_E)
#NYS

Success

We've successfully fetched data from the AQS by areal extent for both the CONUS and for NYS!

NOTE

If you want to only use data for within the state, be sure to filter the new DataFrame to ommit data points from other states that cross into the boundaries of your areal extent.


Summary

In this notebook, we’ve

  • managed to access air pollution data from the EPA’s AQS
  • looked at different ways to fetch the data
  • looked at different types of data available
  • prepared the data for plotting
  • generated time series plots of air pollution data
  • (... include plotting over a map, direct user to other cookbooks for ideas on interactive visuals?)

You are encouraged to explore other variables within the dataset, and to utilize pandas and numpy functions to look at ways to manipulate and analyze these data!

Resources and references

Documentation for pyaqsapi: https://usepa.github.io/pyaqsapi/pyaqsapi.html

More information about the pyaqsapi package (developed by Clinton McCrowey, EPA Region 3) can be found on GitHub: https://github.com/USEPA/pyaqsapi

The EPA’s AQS has general information and documentation here: https://www.epa.gov/aqs

Details about the specific parameter codes can be found here: https://aqs.epa.gov/aqsweb/documents/codetables/parameters.html

To access real-time data for air pollution, the AirNow API can be utilized.

Thanks to Daniel Garver (EPA Region 4) for help locating the AQS API, and for directing the authors of this Cookbook to the appropriate resources.


Information about the author: Adam Deitsch