Dask DataFrame
In this tutorial, you learn:
Basic concepts and features of Dask DataFrames
Applications of Dask DataFrames
Interacting with Dask DataFrames
Built-in operations with Dask DataFrames
Dask DataFrames Best Practices
Prerequisites
Concepts |
Importance |
Notes |
---|---|---|
Necessary |
||
Necessary |
Time to learn: 40 minutes
Introduction
Image credit: Dask Contributors
pandas is a very popular tool for working with tabular datasets, but the dataset needs to fit into the memory.
pandas operates best with smaller datasets, and if you have a large dataset, you’ll receive an out of memory error using pandas. A general rule of thumb for pandas is:
- “Have 5 to 10 times as much RAM as the size of your dataset”
Wes McKinney (2017) in 10 things I hate about pandas
But Dask DataFrame can be used to solve pandas performance issues with larger-than-memory datasets.
What is Dask DataFrame?
A Dask DataFrame is a parallel DataFrame composed of smaller pandas DataFrames (also known as partitions).
Dask Dataframes look and feel like the pandas DataFrames on the surface.
Dask DataFrames partition the data into manageable partitions that can be processed in parallel and across multiple cores or computers.
Similar to Dask Arrays, Dask DataFrames are lazy!
Unlike pandas, operations on Dask DataFrames are not computed until you explicitly request them (e.g. by calling
.compute
).
When to use Dask DataFrame and when to avoid it?
Dask DataFrames are used in situations where pandas fails or has poor performance due to data size.
Dask DataFrame is a good choice when doing parallalizeable computations.
Some examples are:
Element-wise operations such as
df.x + df.y
Row-wise filtering such as
df[df.x>0]
Common aggregations such as
df.x.max()
Dropping duplicates such as
df.x.drop_duplicate()
However, Dask is not great for operations that requires shuffling or re-indexing.
Some examples are:
Set index:
df.set_index(df.x)
See the Dask DataFrame API documentation for a compehnsive list of available functions.
Tutorial Dataset
In this tutorial, we are going to use the NOAA Global Historical Climatology Network Daily (GHCN-D) dataset.
GHCN-D is a public available dataset that includes daily climate records from +100,000 surface observations around the world.
This is an example of a real dataset that is used by NCAR scientists for their research. GHCN-D raw dataset for all stations is available through NOAA Climate Data Online.
To learn more about GHCNd dataset, please visit:
Download the data
For this example, we are going to look through a subset of data from the GHCN-D dataset.
First, we look at the daily observations from Denver International Airport, next we are going to look through selected stations in the US.
The access the preprocessed dataset for this tutorial, please run the following script:
!./get_data.sh
Downloading https://docs.google.com/uc?export=download&id=14doSRn8hT14QYtjZz28GKv14JgdIsbFF
USC00023160.csv
USC00027281.csv
USC00027390.csv
USC00030936.csv
USC00031596.csv
USC00032444.csv
USC00035186.csv
USC00035754.csv
USC00035820.csv
USC00035908.csv
USC00042294.csv
USC00044259.csv
USC00048758.csv
USC00050848.csv
USC00051294.csv
USC00051528.csv
USC00051564.csv
USC00051741.csv
USC00052184.csv
USC00052281.csv
USC00052446.csv
USC00053005.csv
USC00053038.csv
USC00053146.csv
USC00053662.csv
USC00053951.csv
USC00054076.csv
USC00054770.csv
USC00054834.csv
USC00055322.csv
USC00055722.csv
USC00057167.csv
USC00057337.csv
Downloading https://docs.google.com/uc?export=download&id=15rCwQUxxpH6angDhpXzlvbe1nGetYHrf
USC00057936.csv
USC00058204.csv
USC00058429.csv
USC00059243.csv
USC00068138.csv
USC00080211.csv
USC00084731.csv
USC00088824.csv
USC00098703.csv
USC00100010.csv
USC00100470.csv
USC00105275.csv
USC00106152.csv
USC00107264.csv
USC00108137.csv
USC00110338.csv
USC00112140.csv
USC00112193.csv
USC00112348.csv
USC00112483.csv
USC00113335.csv
USC00114108.csv
USC00114442.csv
USC00114823.csv
USC00115079.csv
USC00115326.csv
USC00115712.csv
USC00115768.csv
USC00115833.csv
USC00115901.csv
USC00115943.csv
USC00116446.csv
USW00003017.csv
Downloading https://docs.google.com/uc?export=download&id=1Tbuom1KMCwHjy7-eexEQcOXSr51i6mae
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
This script should save the preprocessed GHCN-D data in ../data
path.
Pandas DataFrame Basics
Let’s start with an example using pandas DataFrame.
First, let’s read in the comma-seperated GHCN-D dataset for one station at Denver International Airport (DIA), CO (site ID : USW00003017
).
To see the list of all available GHCN-D sites and their coordinates and IDs, please see this link.
import os
import pandas as pd
# DIA ghcnd id
site = 'USW00003017'
data_dir = '../data/'
df = pd.read_csv(os.path.join(data_dir, site+'.csv'), parse_dates=['DATE'], index_col=0)
# Display the top five rows of the dataframe
df.head()
ID | YEAR | MONTH | DAY | TMAX | TMAX_FLAGS | TMIN | TMIN_FLAGS | PRCP | PRCP_FLAGS | ... | RHMN_FLAGS | RHMX | RHMX_FLAGS | PSUN | PSUN_FLAGS | LATITUDE | LONGITUDE | ELEVATION | STATE | STATION | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DATE | |||||||||||||||||||||
1994-07-20 | USW00003017 | 1994 | 7 | 20 | 316.0 | XXS | 150.0 | XXS | 20.0 | DXS | ... | XXX | NaN | XXX | NaN | XXX | 39.8467 | -104.6561 | 1647.1 | CO | DENVER INTL AP |
1994-07-23 | USW00003017 | 1994 | 7 | 23 | 355.0 | XXS | 166.0 | XXS | 0.0 | DXS | ... | XXX | NaN | XXX | NaN | XXX | 39.8467 | -104.6561 | 1647.1 | CO | DENVER INTL AP |
1994-07-24 | USW00003017 | 1994 | 7 | 24 | 333.0 | XXS | 155.0 | XXS | 81.0 | DXS | ... | XXX | NaN | XXX | NaN | XXX | 39.8467 | -104.6561 | 1647.1 | CO | DENVER INTL AP |
1994-07-25 | USW00003017 | 1994 | 7 | 25 | 327.0 | XXS | 172.0 | XXS | 0.0 | DXS | ... | XXX | NaN | XXX | NaN | XXX | 39.8467 | -104.6561 | 1647.1 | CO | DENVER INTL AP |
1994-07-26 | USW00003017 | 1994 | 7 | 26 | 327.0 | XXS | 155.0 | XXS | 0.0 | DXS | ... | XXX | NaN | XXX | NaN | XXX | 39.8467 | -104.6561 | 1647.1 | CO | DENVER INTL AP |
5 rows × 99 columns
Question: What variables are available?
df.columns
Index(['ID', 'YEAR', 'MONTH', 'DAY', 'TMAX', 'TMAX_FLAGS', 'TMIN',
'TMIN_FLAGS', 'PRCP', 'PRCP_FLAGS', 'TAVG', 'TAVG_FLAGS', 'SNOW',
'SNOW_FLAGS', 'SNWD', 'SNWD_FLAGS', 'AWND', 'AWND_FLAGS', 'FMTM',
'FMTM_FLAGS', 'PGTM', 'PGTM_FLAGS', 'WDF2', 'WDF2_FLAGS', 'WDF5',
'WDF5_FLAGS', 'WSF2', 'WSF2_FLAGS', 'WSF5', 'WSF5_FLAGS', 'WT01',
'WT01_FLAGS', 'WT02', 'WT02_FLAGS', 'WT08', 'WT08_FLAGS', 'WT16',
'WT16_FLAGS', 'WT17', 'WT17_FLAGS', 'WT18', 'WT18_FLAGS', 'WT03',
'WT03_FLAGS', 'WT05', 'WT05_FLAGS', 'WT19', 'WT19_FLAGS', 'WT10',
'WT10_FLAGS', 'WT09', 'WT09_FLAGS', 'WT06', 'WT06_FLAGS', 'WT07',
'WT07_FLAGS', 'WT11', 'WT11_FLAGS', 'WT13', 'WT13_FLAGS', 'WT21',
'WT21_FLAGS', 'WT14', 'WT14_FLAGS', 'WT15', 'WT15_FLAGS', 'WT22',
'WT22_FLAGS', 'WT04', 'WT04_FLAGS', 'WV03', 'WV03_FLAGS', 'TSUN',
'TSUN_FLAGS', 'WV01', 'WV01_FLAGS', 'WESD', 'WESD_FLAGS', 'ADPT',
'ADPT_FLAGS', 'ASLP', 'ASLP_FLAGS', 'ASTP', 'ASTP_FLAGS', 'AWBT',
'AWBT_FLAGS', 'RHAV', 'RHAV_FLAGS', 'RHMN', 'RHMN_FLAGS', 'RHMX',
'RHMX_FLAGS', 'PSUN', 'PSUN_FLAGS', 'LATITUDE', 'LONGITUDE',
'ELEVATION', 'STATE', 'STATION'],
dtype='object')
The description and units of the dataset is available here.
Operations on pandas DataFrame
pandas DataFrames has several features that give us flexibility to do different calculations and analysis on our dataset. Let’s check some out:
Simple Analysis
For example:
When was the coldest day at this station during December of last year?
# use python slicing notation inside .loc
# use idxmin() to find the index of minimum valus
df.loc['2022-12-01':'2022-12-31'].TMIN.idxmin()
Timestamp('2022-12-22 00:00:00')
# Here we easily plot the prior data using matplotlib from pandas
# -- .loc for value based indexing
df.loc['2022-12-01':'2022-12-31'].SNWD.plot(ylabel= 'Daily Average Snow Depth [mm]')
<Axes: xlabel='DATE', ylabel='Daily Average Snow Depth [mm]'>
How many snow days do we have each year at this station?
Pandas groupby is used for grouping the data according to the categories.
# 1- First select days with snow > 0
# 2- Create a "groupby object" based on the selected columns
# 3- use .size() to compute the size of each group
# 4- sort the values descending
# we count days where SNOW>0, and sort them and show top 5 years:
df[df['SNOW']>0].groupby('YEAR').size().sort_values(ascending=False).head()
YEAR
2015 36
2019 34
2014 32
2008 32
2007 31
dtype: int64
Or for a more complex analysis:
For example, we have heard that this could be Denver’s first January in 13 years with no 60-degree days.
Below, we show all days with high temperature above 60°F (155.5°C/10) since 2010:
df[(df['MONTH']==1) & (df['YEAR']>=2010) & (df['TMAX']>155.5)].groupby(['YEAR']).size()
YEAR
2011 1
2012 6
2013 4
2014 3
2015 6
2016 1
2017 4
2018 5
2019 3
2020 2
2021 2
2022 3
dtype: int64
This is great! But how big is this dataset for one station?
First, let’s check the file size:
!ls -lh ../data/USW00003017.csv
-rw-r--r-- 1 runner docker 3.6M Feb 5 2023 ../data/USW00003017.csv
Similar to the previous tutorial, we can use the following function to find the size of a variable on memory.
# Define function to display variable size in MB
import sys
def var_size(in_var):
result = sys.getsizeof(in_var) / 1e6
print(f"Size of variable: {result:.2f} MB")
var_size(df)
Size of variable: 33.21 MB
Remember, the above rule?
- “Have 5 to 10 times as much RAM as the size of your dataset”
Wes McKinney (2017) in 10 things I hate about pandas
So far, we read in and analyzed data for one station. We have a total of +118,000 stations over the world and +4500 stations in Colorado alone!
What if we want to look at the larger dataset?
Scaling up to a larger dataset
Let’s start by reading data from selected stations. The downloaded data for this example includes the climatology observations from 66 selected sites in Colorado.
Pandas can concatenate data to load data spread across multiple files:
!du -csh ../data/*.csv |tail -n1
565M total
Using a for loop with pandas.concat
, we can read multiple files at the same time:
%%time
import glob
co_sites = glob.glob(os.path.join(data_dir, '*.csv'))
df = pd.concat(pd.read_csv(f, index_col=0, parse_dates=['DATE']) for f in co_sites)
CPU times: user 7.71 s, sys: 1.42 s, total: 9.13 s
Wall time: 9.13 s
How many stations have we read in?
print ("Concatenated data for", len(df.ID.unique()), "unique sites.")
Concatenated data for 66 unique sites.
Now that we concatenated the data for all sites in one DataFrame, we can do similar analysis on it:
Which site has recorded the most snow days in a year?
%%time
# ~90s on 4GB RAM
snowy_days = df[df['SNOW']>0].groupby(['ID','YEAR']).size()
print ('This site has the highest number of snow days in a year : ')
snowy_days.agg(['idxmax','max'])
This site has the highest number of snow days in a year :
CPU times: user 357 ms, sys: 0 ns, total: 357 ms
Wall time: 357 ms
idxmax (USC00052281, 1983)
max 102
dtype: object
Excersise: Which Colorado site has recorded the most snow days in 2023?
Dask allows us to conceptualize all of these files as a single dataframe!
# Let's do a little cleanup
del df, snowy_days
Computations on Dask DataFrame
Create a “LocalCluster” Client with Dask
from dask.distributed import Client, LocalCluster
cluster = LocalCluster()
client = Client(cluster)
client
Client
Client-9d13155f-2371-11ef-8a2a-00224802c87d
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
68d13b52
Dashboard: http://127.0.0.1:8787/status | Workers: 4 |
Total threads: 4 | Total memory: 15.61 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-a8d7a790-ef3f-4db5-bb72-da14bf72a2f0
Comm: tcp://127.0.0.1:43133 | Workers: 4 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 4 |
Started: Just now | Total memory: 15.61 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:40243 | Total threads: 1 |
Dashboard: http://127.0.0.1:44853/status | Memory: 3.90 GiB |
Nanny: tcp://127.0.0.1:37289 | |
Local directory: /tmp/dask-scratch-space/worker-tr6d21nl |
Worker: 1
Comm: tcp://127.0.0.1:38239 | Total threads: 1 |
Dashboard: http://127.0.0.1:43069/status | Memory: 3.90 GiB |
Nanny: tcp://127.0.0.1:33265 | |
Local directory: /tmp/dask-scratch-space/worker-swp45e_6 |
Worker: 2
Comm: tcp://127.0.0.1:40639 | Total threads: 1 |
Dashboard: http://127.0.0.1:35651/status | Memory: 3.90 GiB |
Nanny: tcp://127.0.0.1:45551 | |
Local directory: /tmp/dask-scratch-space/worker-bc6oet_r |
Worker: 3
Comm: tcp://127.0.0.1:46009 | Total threads: 1 |
Dashboard: http://127.0.0.1:34727/status | Memory: 3.90 GiB |
Nanny: tcp://127.0.0.1:40171 | |
Local directory: /tmp/dask-scratch-space/worker-3vssu1nr |
☝️ Click the Dashboard link above.
👈 Or click the “Search” 🔍 button in the dask-labextension dashboard.
Dask DataFrame read_csv
to read multiple files
dask.dataframe.read_csv
function can be used in conjunction with glob
to read multiple csv files at the same time.
Remember we can read one file with pandas.read_csv
. For reading multiple files with pandas, we have to concatenate them with pd.concatenate
. However, we can read many files at once just using dask.dataframe.read_csv
.
Overall, Dask is designed to perform I/O in parallel and is more performant than pandas for operations with multiple files or large files.
%%time
import dask
import dask.dataframe as dd
ddf = dd.read_csv(co_sites, parse_dates=['DATE'])
ddf
CPU times: user 285 ms, sys: 24 ms, total: 309 ms
Wall time: 305 ms
DATE | ID | YEAR | MONTH | DAY | PRCP | PRCP_FLAGS | SNOW | SNOW_FLAGS | TMAX | TMAX_FLAGS | TMIN | TMIN_FLAGS | SNWD | SNWD_FLAGS | TOBS | TOBS_FLAGS | WT16 | WT16_FLAGS | WT08 | WT08_FLAGS | WT11 | WT11_FLAGS | DAPR | DAPR_FLAGS | MDPR | MDPR_FLAGS | DASF | DASF_FLAGS | MDSF | MDSF_FLAGS | WT04 | WT04_FLAGS | WT03 | WT03_FLAGS | WT07 | WT07_FLAGS | WT09 | WT09_FLAGS | WT14 | WT14_FLAGS | WT18 | WT18_FLAGS | WT01 | WT01_FLAGS | WT06 | WT06_FLAGS | WT05 | WT05_FLAGS | LATITUDE | LONGITUDE | ELEVATION | STATE | STATION | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
npartitions=66 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
datetime64[ns] | string | int64 | int64 | int64 | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | string | float64 | float64 | float64 | string | string | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
ddf.TMAX.mean()
<dask_expr.expr.Scalar: expr=ReadCSV(e521a01)['TMAX'].mean(), dtype=float64>
Notice that the representation of the DataFrame object contains no data just headers and datatypes. Why?
Lazy Evaluation
Similar to Dask Arrays, Dask DataFrames are lazy. Here the data has not yet been read into the dataframe yet (a.k.a. lazy evaluation).
Dask just construct the task graph of the computation but it will “evaluate” them only when necessary.
So how does Dask know the name and dtype of each column?
Dask has just read the start of the first file and infers the column names and dtypes.
Unlike pandas.read_csv
that reads in all files before inferring data types, dask.dataframe.read_csv
only reads in a sample from the beginning of the file (or first file if using a glob). The column names and dtypes are then enforced when reading the specific partitions (Dask can make mistakes on these inferences if there is missing or misleading data in the early rows).
Let’s take a look at the start of our dataframe:
ddf.head()
DATE | ID | YEAR | MONTH | DAY | PRCP | PRCP_FLAGS | SNOW | SNOW_FLAGS | TMAX | ... | WT01_FLAGS | WT06 | WT06_FLAGS | WT05 | WT05_FLAGS | LATITUDE | LONGITUDE | ELEVATION | STATE | STATION | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1893-04-01 | USC00059243 | 1893 | 4 | 1 | 0.0 | PX6 | 0.0 | XX6 | NaN | ... | XXX | NaN | XXX | NaN | XXX | 40.0819 | -102.2069 | 1083.0 | CO | WRAY |
1 | 1893-04-02 | USC00059243 | 1893 | 4 | 2 | 0.0 | PX6 | 0.0 | XX6 | NaN | ... | XXX | NaN | XXX | NaN | XXX | 40.0819 | -102.2069 | 1083.0 | CO | WRAY |
2 | 1893-04-03 | USC00059243 | 1893 | 4 | 3 | 0.0 | PX6 | 0.0 | XX6 | NaN | ... | XXX | NaN | XXX | NaN | XXX | 40.0819 | -102.2069 | 1083.0 | CO | WRAY |
3 | 1893-04-04 | USC00059243 | 1893 | 4 | 4 | 0.0 | PX6 | 0.0 | XX6 | 239.0 | ... | XXX | NaN | XXX | NaN | XXX | 40.0819 | -102.2069 | 1083.0 | CO | WRAY |
4 | 1893-04-05 | USC00059243 | 1893 | 4 | 5 | 0.0 | PX6 | 0.0 | XX6 | 228.0 | ... | XXX | NaN | XXX | NaN | XXX | 40.0819 | -102.2069 | 1083.0 | CO | WRAY |
5 rows × 54 columns
NOTE: Whenever we operate on our dataframe we read through all of our CSV data so that we don’t fill up RAM. Dask will delete intermediate results (like the full pandas DataFrame for each file) as soon as possible. This enables you to handle larger than memory datasets but, repeated computations will have to load all of the data in each time.
Similar data manipulations as pandas.dataframe
can be done for dask.dataframes
.
For example, let’s find the highest number of snow days in Colorado:
%%time
print ('This site has the highest number of snow days in a year : ')
snowy_days = ddf[ddf['SNOW']>0].groupby(['ID','YEAR']).size()
snowy_days.compute().agg(['idxmax','max'])
This site has the highest number of snow days in a year :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask/backends.py:140, in CreationDispatch.register_inplace.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
139 try:
--> 140 return func(*args, **kwargs)
141 except Exception as e:
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask/dataframe/io/csv.py:771, in make_reader.<locals>.read(urlpath, blocksize, lineterminator, compression, sample, sample_rows, enforce, assume_missing, storage_options, include_path_column, **kwargs)
758 def read(
759 urlpath,
760 blocksize="default",
(...)
769 **kwargs,
770 ):
--> 771 return read_pandas(
772 reader,
773 urlpath,
774 blocksize=blocksize,
775 lineterminator=lineterminator,
776 compression=compression,
777 sample=sample,
778 sample_rows=sample_rows,
779 enforce=enforce,
780 assume_missing=assume_missing,
781 storage_options=storage_options,
782 include_path_column=include_path_column,
783 **kwargs,
784 )
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask/dataframe/io/csv.py:640, in read_pandas(reader, urlpath, blocksize, lineterminator, compression, sample, sample_rows, enforce, assume_missing, storage_options, include_path_column, **kwargs)
639 try:
--> 640 head = reader(BytesIO(b_sample), nrows=sample_rows, **head_kwargs)
641 except pd.errors.ParserError as e:
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
1024 kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/pandas/io/parsers/readers.py:620, in _read(filepath_or_buffer, kwds)
619 # Create the parser.
--> 620 parser = TextFileReader(filepath_or_buffer, **kwds)
622 if chunksize or iterator:
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1620, in TextFileReader.__init__(self, f, engine, **kwds)
1619 self.handles: IOHandles | None = None
-> 1620 self._engine = self._make_engine(f, self.engine)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1898, in TextFileReader._make_engine(self, f, engine)
1897 try:
-> 1898 return mapping[engine](f, **self.options)
1899 except Exception:
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py:161, in CParserWrapper.__init__(self, src, **kwds)
160 # error: Cannot determine type of 'names'
--> 161 self._validate_parse_dates_presence(self.names) # type: ignore[has-type]
162 self._set_noconvert_columns()
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/pandas/io/parsers/base_parser.py:243, in ParserBase._validate_parse_dates_presence(self, columns)
242 if missing_cols:
--> 243 raise ValueError(
244 f"Missing column provided to 'parse_dates': '{missing_cols}'"
245 )
246 # Convert positions to actual column names
ValueError: Missing column provided to 'parse_dates': 'DATE'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
File <timed exec>:3
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_collection.py:475, in FrameBase.compute(self, fuse, **kwargs)
473 if not isinstance(out, Scalar):
474 out = out.repartition(npartitions=1)
--> 475 out = out.optimize(fuse=fuse)
476 return DaskMethodsMixin.compute(out, **kwargs)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_collection.py:590, in FrameBase.optimize(self, fuse)
572 def optimize(self, fuse: bool = True):
573 """Optimizes the DataFrame.
574
575 Runs the optimizer with all steps over the DataFrame and wraps the result in a
(...)
588 The optimized Dask Dataframe
589 """
--> 590 return new_collection(self.expr.optimize(fuse=fuse))
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_expr.py:94, in Expr.optimize(self, **kwargs)
93 def optimize(self, **kwargs):
---> 94 return optimize(self, **kwargs)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_expr.py:3032, in optimize(expr, fuse)
3011 """High level query optimization
3012
3013 This leverages three optimization passes:
(...)
3028 optimize_blockwise_fusion
3029 """
3030 stage: core.OptimizerStage = "fused" if fuse else "simplified-physical"
-> 3032 return optimize_until(expr, stage)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_expr.py:2983, in optimize_until(expr, stage)
2980 return result
2982 # Simplify
-> 2983 expr = result.simplify()
2984 if stage == "simplified-logical":
2985 return expr
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_core.py:371, in Expr.simplify(self)
369 while True:
370 dependents = collect_dependents(expr)
--> 371 new = expr.simplify_once(dependents=dependents, simplified={})
372 if new._name == expr._name:
373 break
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_core.py:349, in Expr.simplify_once(self, dependents, simplified)
346 if isinstance(operand, Expr):
347 # Bandaid for now, waiting for Singleton
348 dependents[operand._name].append(weakref.ref(expr))
--> 349 new = operand.simplify_once(
350 dependents=dependents, simplified=simplified
351 )
352 simplified[operand._name] = new
353 if new._name != operand._name:
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_core.py:322, in Expr.simplify_once(self, dependents, simplified)
319 expr = self
321 while True:
--> 322 out = expr._simplify_down()
323 if out is None:
324 out = expr
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_groupby.py:595, in Size._simplify_down(self)
591 def _simplify_down(self):
592 if (
593 self._slice is not None
594 and not isinstance(self._slice, list)
--> 595 or self.frame.ndim == 1
596 ):
597 # Scalar slices influence the result and are allowed, i.e., the name of
598 # the series is different
599 return
601 # We can remove every column since pandas reduces to a Series anyway
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/functools.py:981, in cached_property.__get__(self, instance, owner)
979 val = cache.get(self.attrname, _NOT_FOUND)
980 if val is _NOT_FOUND:
--> 981 val = self.func(instance)
982 try:
983 cache[self.attrname] = val
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_expr.py:84, in Expr.ndim(self)
82 @functools.cached_property
83 def ndim(self):
---> 84 meta = self._meta
85 try:
86 return meta.ndim
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/functools.py:981, in cached_property.__get__(self, instance, owner)
979 val = cache.get(self.attrname, _NOT_FOUND)
980 if val is _NOT_FOUND:
--> 981 val = self.func(instance)
982 try:
983 cache[self.attrname] = val
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_expr.py:495, in Blockwise._meta(self)
493 @functools.cached_property
494 def _meta(self):
--> 495 args = [op._meta if isinstance(op, Expr) else op for op in self._args]
496 return self.operation(*args, **self._kwargs)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_expr.py:495, in <listcomp>(.0)
493 @functools.cached_property
494 def _meta(self):
--> 495 args = [op._meta if isinstance(op, Expr) else op for op in self._args]
496 return self.operation(*args, **self._kwargs)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/functools.py:981, in cached_property.__get__(self, instance, owner)
979 val = cache.get(self.attrname, _NOT_FOUND)
980 if val is _NOT_FOUND:
--> 981 val = self.func(instance)
982 try:
983 cache[self.attrname] = val
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_expr.py:2049, in Projection._meta(self)
2047 @functools.cached_property
2048 def _meta(self):
-> 2049 if is_dataframe_like(self.frame._meta):
2050 return super()._meta
2051 # if we are not a DataFrame and have a scalar, we reduce to a scalar
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/functools.py:981, in cached_property.__get__(self, instance, owner)
979 val = cache.get(self.attrname, _NOT_FOUND)
980 if val is _NOT_FOUND:
--> 981 val = self.func(instance)
982 try:
983 cache[self.attrname] = val
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/io/csv.py:92, in ReadCSV._meta(self)
90 @functools.cached_property
91 def _meta(self):
---> 92 return self._ddf._meta
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/functools.py:981, in cached_property.__get__(self, instance, owner)
979 val = cache.get(self.attrname, _NOT_FOUND)
980 if val is _NOT_FOUND:
--> 981 val = self.func(instance)
982 try:
983 cache[self.attrname] = val
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/io/csv.py:82, in ReadCSV._ddf(self)
79 elif usecols:
80 columns = usecols
---> 82 return self.operation(
83 self.filename,
84 usecols=columns,
85 header=self.header,
86 storage_options=self.storage_options,
87 **kwargs,
88 )
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask/backends.py:142, in CreationDispatch.register_inplace.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
140 return func(*args, **kwargs)
141 except Exception as e:
--> 142 raise type(e)(
143 f"An error occurred while calling the {funcname(func)} "
144 f"method registered to the {self.backend} backend.\n"
145 f"Original Message: {e}"
146 ) from e
ValueError: An error occurred while calling the read_csv method registered to the pandas backend.
Original Message: Missing column provided to 'parse_dates': 'DATE'
Nice, but what did Dask do?
# Requires ipywidgets
snowy_days.dask
{('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
1,
0): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.aggregate of <class 'dask_expr._groupby.Size'>>, [[('chunk-41aba8c2b33aadd00839867a2665ae5b',
0),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 1),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 2),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 3),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 4),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 5),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 6),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
7)]], {'aggfunc': <methodcaller: sum>,
'levels': [0, 1],
'sort': None,
'observed': False}),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
1,
1): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.aggregate of <class 'dask_expr._groupby.Size'>>, [[('chunk-41aba8c2b33aadd00839867a2665ae5b',
8),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 9),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 10),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 11),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 12),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 13),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 14),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
15)]], {'aggfunc': <methodcaller: sum>,
'levels': [0, 1],
'sort': None,
'observed': False}),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
1,
2): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.aggregate of <class 'dask_expr._groupby.Size'>>, [[('chunk-41aba8c2b33aadd00839867a2665ae5b',
16),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 17),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 18),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 19),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 20),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 21),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 22),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
23)]], {'aggfunc': <methodcaller: sum>,
'levels': [0, 1],
'sort': None,
'observed': False}),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
1,
3): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.aggregate of <class 'dask_expr._groupby.Size'>>, [[('chunk-41aba8c2b33aadd00839867a2665ae5b',
24),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 25),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 26),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 27),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 28),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 29),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 30),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
31)]], {'aggfunc': <methodcaller: sum>,
'levels': [0, 1],
'sort': None,
'observed': False}),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
1,
4): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.aggregate of <class 'dask_expr._groupby.Size'>>, [[('chunk-41aba8c2b33aadd00839867a2665ae5b',
32),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 33),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 34),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 35),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 36),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 37),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 38),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
39)]], {'aggfunc': <methodcaller: sum>,
'levels': [0, 1],
'sort': None,
'observed': False}),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
1,
5): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.aggregate of <class 'dask_expr._groupby.Size'>>, [[('chunk-41aba8c2b33aadd00839867a2665ae5b',
40),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 41),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 42),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 43),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 44),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 45),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 46),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
47)]], {'aggfunc': <methodcaller: sum>,
'levels': [0, 1],
'sort': None,
'observed': False}),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
1,
6): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.aggregate of <class 'dask_expr._groupby.Size'>>, [[('chunk-41aba8c2b33aadd00839867a2665ae5b',
48),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 49),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 50),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 51),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 52),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 53),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 54),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
55)]], {'aggfunc': <methodcaller: sum>,
'levels': [0, 1],
'sort': None,
'observed': False}),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
1,
7): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.aggregate of <class 'dask_expr._groupby.Size'>>, [[('chunk-41aba8c2b33aadd00839867a2665ae5b',
56),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 57),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 58),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 59),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 60),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 61),
('chunk-41aba8c2b33aadd00839867a2665ae5b', 62),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
63)]], {'aggfunc': <methodcaller: sum>,
'levels': [0, 1],
'sort': None,
'observed': False}),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
1,
8): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.aggregate of <class 'dask_expr._groupby.Size'>>, [[('chunk-41aba8c2b33aadd00839867a2665ae5b',
64),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
65)]], {'aggfunc': <methodcaller: sum>,
'levels': [0, 1],
'sort': None,
'observed': False}),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
2,
0): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.aggregate of <class 'dask_expr._groupby.Size'>>, [[('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
1,
0),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88', 1, 1),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88', 1, 2),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88', 1, 3),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88', 1, 4),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88', 1, 5),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88', 1, 6),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
1,
7)]], {'aggfunc': <methodcaller: sum>,
'levels': [0, 1],
'sort': None,
'observed': False}),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
2,
1): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.aggregate of <class 'dask_expr._groupby.Size'>>, [[('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
1,
8)]], {'aggfunc': <methodcaller: sum>,
'levels': [0, 1],
'sort': None,
'observed': False}),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
0): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.aggregate of <class 'dask_expr._groupby.Size'>>, [[('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
2,
0),
('size-tree-15bcd42e02ae7df7aba2bc1816ffef88',
2,
1)]], {'aggfunc': <methodcaller: sum>,
'levels': [0, 1],
'sort': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
0): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
0),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
1): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
1),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
2): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
2),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
3): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
3),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
4): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
4),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
5): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
5),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
6): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
6),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
7): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
7),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
8): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
8),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
9): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
9),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
10): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
10),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
11): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
11),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
12): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
12),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
13): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
13),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
14): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
14),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
15): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
15),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
16): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
16),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
17): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
17),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
18): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
18),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
19): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
19),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
20): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
20),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
21): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
21),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
22): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
22),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
23): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
23),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
24): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
24),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
25): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
25),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
26): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
26),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
27): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
27),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
28): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
28),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
29): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
29),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
30): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
30),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
31): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
31),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
32): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
32),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
33): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
33),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
34): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
34),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
35): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
35),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
36): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
36),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
37): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
37),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
38): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
38),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
39): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
39),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
40): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
40),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
41): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
41),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
42): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
42),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
43): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
43),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
44): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
44),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
45): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
45),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
46): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
46),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
47): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
47),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
48): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
48),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
49): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
49),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
50): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
50),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
51): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
51),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
52): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
52),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
53): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
53),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
54): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
54),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
55): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
55),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
56): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
56),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
57): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
57),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
58): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
58),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
59): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
59),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
60): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
60),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
61): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
61),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
62): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
62),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
63): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
63),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
64): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
64),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('chunk-41aba8c2b33aadd00839867a2665ae5b',
65): (<function dask.utils.apply(func, args, kwargs=None)>, <bound method SingleAggregation.chunk of <class 'dask_expr._groupby.Size'>>, [('getitem-cd82dc4610c7c263885ab456c7c7d73d',
65),
'ID',
'YEAR'], {'chunk': <methodcaller: size>,
'columns': None,
'observed': False}),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
0): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
0), ('gt-6d1555d905dc9c50616eb3c2355489da', 0)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
1): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
1), ('gt-6d1555d905dc9c50616eb3c2355489da', 1)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
2): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
2), ('gt-6d1555d905dc9c50616eb3c2355489da', 2)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
3): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
3), ('gt-6d1555d905dc9c50616eb3c2355489da', 3)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
4): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
4), ('gt-6d1555d905dc9c50616eb3c2355489da', 4)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
5): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
5), ('gt-6d1555d905dc9c50616eb3c2355489da', 5)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
6): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
6), ('gt-6d1555d905dc9c50616eb3c2355489da', 6)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
7): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
7), ('gt-6d1555d905dc9c50616eb3c2355489da', 7)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
8): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
8), ('gt-6d1555d905dc9c50616eb3c2355489da', 8)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
9): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
9), ('gt-6d1555d905dc9c50616eb3c2355489da', 9)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
10): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
10), ('gt-6d1555d905dc9c50616eb3c2355489da', 10)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
11): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
11), ('gt-6d1555d905dc9c50616eb3c2355489da', 11)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
12): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
12), ('gt-6d1555d905dc9c50616eb3c2355489da', 12)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
13): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
13), ('gt-6d1555d905dc9c50616eb3c2355489da', 13)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
14): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
14), ('gt-6d1555d905dc9c50616eb3c2355489da', 14)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
15): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
15), ('gt-6d1555d905dc9c50616eb3c2355489da', 15)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
16): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
16), ('gt-6d1555d905dc9c50616eb3c2355489da', 16)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
17): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
17), ('gt-6d1555d905dc9c50616eb3c2355489da', 17)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
18): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
18), ('gt-6d1555d905dc9c50616eb3c2355489da', 18)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
19): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
19), ('gt-6d1555d905dc9c50616eb3c2355489da', 19)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
20): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
20), ('gt-6d1555d905dc9c50616eb3c2355489da', 20)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
21): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
21), ('gt-6d1555d905dc9c50616eb3c2355489da', 21)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
22): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
22), ('gt-6d1555d905dc9c50616eb3c2355489da', 22)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
23): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
23), ('gt-6d1555d905dc9c50616eb3c2355489da', 23)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
24): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
24), ('gt-6d1555d905dc9c50616eb3c2355489da', 24)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
25): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
25), ('gt-6d1555d905dc9c50616eb3c2355489da', 25)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
26): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
26), ('gt-6d1555d905dc9c50616eb3c2355489da', 26)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
27): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
27), ('gt-6d1555d905dc9c50616eb3c2355489da', 27)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
28): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
28), ('gt-6d1555d905dc9c50616eb3c2355489da', 28)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
29): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
29), ('gt-6d1555d905dc9c50616eb3c2355489da', 29)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
30): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
30), ('gt-6d1555d905dc9c50616eb3c2355489da', 30)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
31): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
31), ('gt-6d1555d905dc9c50616eb3c2355489da', 31)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
32): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
32), ('gt-6d1555d905dc9c50616eb3c2355489da', 32)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
33): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
33), ('gt-6d1555d905dc9c50616eb3c2355489da', 33)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
34): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
34), ('gt-6d1555d905dc9c50616eb3c2355489da', 34)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
35): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
35), ('gt-6d1555d905dc9c50616eb3c2355489da', 35)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
36): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
36), ('gt-6d1555d905dc9c50616eb3c2355489da', 36)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
37): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
37), ('gt-6d1555d905dc9c50616eb3c2355489da', 37)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
38): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
38), ('gt-6d1555d905dc9c50616eb3c2355489da', 38)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
39): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
39), ('gt-6d1555d905dc9c50616eb3c2355489da', 39)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
40): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
40), ('gt-6d1555d905dc9c50616eb3c2355489da', 40)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
41): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
41), ('gt-6d1555d905dc9c50616eb3c2355489da', 41)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
42): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
42), ('gt-6d1555d905dc9c50616eb3c2355489da', 42)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
43): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
43), ('gt-6d1555d905dc9c50616eb3c2355489da', 43)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
44): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
44), ('gt-6d1555d905dc9c50616eb3c2355489da', 44)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
45): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
45), ('gt-6d1555d905dc9c50616eb3c2355489da', 45)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
46): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
46), ('gt-6d1555d905dc9c50616eb3c2355489da', 46)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
47): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
47), ('gt-6d1555d905dc9c50616eb3c2355489da', 47)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
48): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
48), ('gt-6d1555d905dc9c50616eb3c2355489da', 48)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
49): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
49), ('gt-6d1555d905dc9c50616eb3c2355489da', 49)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
50): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
50), ('gt-6d1555d905dc9c50616eb3c2355489da', 50)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
51): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
51), ('gt-6d1555d905dc9c50616eb3c2355489da', 51)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
52): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
52), ('gt-6d1555d905dc9c50616eb3c2355489da', 52)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
53): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
53), ('gt-6d1555d905dc9c50616eb3c2355489da', 53)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
54): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
54), ('gt-6d1555d905dc9c50616eb3c2355489da', 54)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
55): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
55), ('gt-6d1555d905dc9c50616eb3c2355489da', 55)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
56): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
56), ('gt-6d1555d905dc9c50616eb3c2355489da', 56)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
57): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
57), ('gt-6d1555d905dc9c50616eb3c2355489da', 57)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
58): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
58), ('gt-6d1555d905dc9c50616eb3c2355489da', 58)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
59): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
59), ('gt-6d1555d905dc9c50616eb3c2355489da', 59)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
60): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
60), ('gt-6d1555d905dc9c50616eb3c2355489da', 60)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
61): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
61), ('gt-6d1555d905dc9c50616eb3c2355489da', 61)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
62): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
62), ('gt-6d1555d905dc9c50616eb3c2355489da', 62)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
63): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
63), ('gt-6d1555d905dc9c50616eb3c2355489da', 63)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
64): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
64), ('gt-6d1555d905dc9c50616eb3c2355489da', 64)),
('getitem-cd82dc4610c7c263885ab456c7c7d73d',
65): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
65), ('gt-6d1555d905dc9c50616eb3c2355489da', 65)),
('gt-6d1555d905dc9c50616eb3c2355489da', 0): (<function _operator.gt(a, b, /)>,
('getitem-9744343c7e480dfa57fd485180878318', 0),
0),
('gt-6d1555d905dc9c50616eb3c2355489da', 1): (<function _operator.gt(a, b, /)>,
('getitem-9744343c7e480dfa57fd485180878318', 1),
0),
('gt-6d1555d905dc9c50616eb3c2355489da', 2): (<function _operator.gt(a, b, /)>,
('getitem-9744343c7e480dfa57fd485180878318', 2),
0),
('gt-6d1555d905dc9c50616eb3c2355489da', 3): (<function _operator.gt(a, b, /)>,
('getitem-9744343c7e480dfa57fd485180878318', 3),
0),
('gt-6d1555d905dc9c50616eb3c2355489da', 4): (<function _operator.gt(a, b, /)>,
('getitem-9744343c7e480dfa57fd485180878318', 4),
0),
('gt-6d1555d905dc9c50616eb3c2355489da', 5): (<function _operator.gt(a, b, /)>,
('getitem-9744343c7e480dfa57fd485180878318', 5),
0),
('gt-6d1555d905dc9c50616eb3c2355489da', 6): (<function _operator.gt(a, b, /)>,
('getitem-9744343c7e480dfa57fd485180878318', 6),
0),
('gt-6d1555d905dc9c50616eb3c2355489da', 7): (<function _operator.gt(a, b, /)>,
('getitem-9744343c7e480dfa57fd485180878318', 7),
0),
('gt-6d1555d905dc9c50616eb3c2355489da', 8): (<function _operator.gt(a, b, /)>,
('getitem-9744343c7e480dfa57fd485180878318', 8),
0),
('gt-6d1555d905dc9c50616eb3c2355489da', 9): (<function _operator.gt(a, b, /)>,
('getitem-9744343c7e480dfa57fd485180878318', 9),
0),
('gt-6d1555d905dc9c50616eb3c2355489da',
10): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
10), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
11): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
11), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
12): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
12), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
13): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
13), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
14): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
14), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
15): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
15), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
16): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
16), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
17): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
17), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
18): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
18), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
19): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
19), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
20): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
20), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
21): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
21), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
22): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
22), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
23): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
23), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
24): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
24), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
25): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
25), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
26): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
26), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
27): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
27), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
28): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
28), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
29): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
29), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
30): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
30), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
31): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
31), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
32): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
32), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
33): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
33), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
34): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
34), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
35): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
35), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
36): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
36), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
37): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
37), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
38): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
38), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
39): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
39), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
40): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
40), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
41): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
41), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
42): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
42), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
43): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
43), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
44): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
44), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
45): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
45), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
46): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
46), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
47): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
47), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
48): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
48), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
49): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
49), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
50): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
50), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
51): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
51), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
52): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
52), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
53): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
53), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
54): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
54), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
55): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
55), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
56): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
56), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
57): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
57), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
58): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
58), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
59): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
59), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
60): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
60), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
61): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
61), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
62): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
62), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
63): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
63), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
64): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
64), 0),
('gt-6d1555d905dc9c50616eb3c2355489da',
65): (<function _operator.gt(a, b, /)>, ('getitem-9744343c7e480dfa57fd485180878318',
65), 0),
('getitem-9744343c7e480dfa57fd485180878318',
0): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
0), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
1): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
1), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
2): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
2), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
3): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
3), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
4): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
4), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
5): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
5), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
6): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
6), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
7): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
7), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
8): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
8), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
9): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
9), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
10): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
10), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
11): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
11), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
12): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
12), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
13): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
13), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
14): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
14), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
15): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
15), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
16): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
16), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
17): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
17), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
18): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
18), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
19): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
19), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
20): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
20), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
21): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
21), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
22): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
22), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
23): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
23), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
24): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
24), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
25): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
25), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
26): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
26), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
27): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
27), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
28): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
28), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
29): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
29), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
30): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
30), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
31): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
31), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
32): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
32), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
33): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
33), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
34): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
34), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
35): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
35), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
36): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
36), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
37): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
37), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
38): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
38), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
39): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
39), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
40): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
40), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
41): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
41), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
42): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
42), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
43): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
43), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
44): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
44), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
45): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
45), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
46): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
46), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
47): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
47), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
48): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
48), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
49): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
49), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
50): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
50), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
51): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
51), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
52): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
52), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
53): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
53), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
54): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
54), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
55): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
55), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
56): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
56), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
57): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
57), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
58): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
58), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
59): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
59), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
60): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
60), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
61): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
61), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
62): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
62), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
63): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
63), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
64): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
64), 'SNOW'),
('getitem-9744343c7e480dfa57fd485180878318',
65): (<function _operator.getitem(a, b, /)>, ('read_csv-d60643b43d96156355546ebe4e521a01',
65), 'SNOW'),
('read_csv-d60643b43d96156355546ebe4e521a01',
0): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00059243.csv'>,
0,
8467532,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
1): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00044259.csv'>,
0,
9255424,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
2): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00100470.csv'>,
0,
8756776,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
3): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00031596.csv'>,
0,
9394879,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
4): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00115768.csv'>,
0,
9420274,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
5): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00115326.csv'>,
0,
8724022,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
6): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00023160.csv'>,
0,
9112121,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
7): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00035820.csv'>,
0,
9196490,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
8): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00055322.csv'>,
0,
7615847,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
9): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00114823.csv'>,
0,
9377528,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
10): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00051528.csv'>,
0,
8783750,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
11): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00030936.csv'>,
0,
8801229,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
12): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00114442.csv'>,
0,
9505114,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
13): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00115901.csv'>,
0,
9204455,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
14): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00106152.csv'>,
0,
12026314,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
15): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00035186.csv'>,
0,
10194856,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
16): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00052184.csv'>,
0,
8043565,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
17): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00053038.csv'>,
0,
8487651,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
18): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00055722.csv'>,
0,
8698685,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
19): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00115943.csv'>,
0,
10976013,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
20): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00051294.csv'>,
0,
9172415,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
21): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00053005.csv'>,
0,
11842479,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
22): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00058429.csv'>,
0,
8270584,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
23): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00053662.csv'>,
0,
8104635,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
24): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00057167.csv'>,
0,
9312478,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
25): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00053951.csv'>,
0,
7267708,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
26): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00052281.csv'>,
0,
7343612,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
27): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00112140.csv'>,
0,
8965440,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
28): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00054770.csv'>,
0,
8740826,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
29): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00107264.csv'>,
0,
8677724,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
30): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00048758.csv'>,
0,
10181879,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
31): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00032444.csv'>,
0,
9447489,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
32): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00108137.csv'>,
0,
10187817,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
33): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00114108.csv'>,
0,
9082884,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
34): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00098703.csv'>,
0,
9209668,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
35): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00051564.csv'>,
0,
8971622,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
36): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00057936.csv'>,
0,
8317871,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
37): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00100010.csv'>,
0,
10093177,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
38): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00116446.csv'>,
0,
9283402,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
39): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00068138.csv'>,
0,
8754052,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
40): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00027281.csv'>,
0,
8981508,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
41): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00110338.csv'>,
0,
9624032,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
42): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00115833.csv'>,
0,
8839839,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
43): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00054076.csv'>,
0,
8551904,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
44): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00105275.csv'>,
0,
9029248,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
45): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00054834.csv'>,
0,
9219418,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
46): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00051741.csv'>,
0,
6759653,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
47): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00057337.csv'>,
0,
7542301,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
48): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00112348.csv'>,
0,
9373120,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
49): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00035908.csv'>,
0,
9097064,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
50): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00052446.csv'>,
0,
7776918,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
51): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00058204.csv'>,
0,
8003946,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
52): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00115712.csv'>,
0,
8977888,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
53): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00115079.csv'>,
0,
8856586,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
54): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00042294.csv'>,
0,
11791593,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
55): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00113335.csv'>,
0,
8978417,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
56): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00112193.csv'>,
0,
9504628,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
57): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00084731.csv'>,
0,
10205327,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
58): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00112483.csv'>,
0,
9056012,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
59): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00080211.csv'>,
0,
8784707,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
60): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00027390.csv'>,
0,
8653392,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
61): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00088824.csv'>,
0,
8779188,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
62): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00050848.csv'>,
0,
9182374,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
63): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USW00003017.csv'>,
0,
3724464,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
64): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00053146.csv'>,
0,
8623236,
b'\n'),
None,
True,
True]),
('read_csv-d60643b43d96156355546ebe4e521a01',
65): (subgraph_callable-b4b1c0f08f602b32b8e77521f105e350, [(<function dask.bytes.core.read_block_from_file(lazy_file, off, bs, delimiter)>,
<OpenFile '/home/runner/work/dask-cookbook/dask-cookbook/notebooks/../data/USC00035754.csv'>,
0,
8980905,
b'\n'),
None,
True,
True])}
You can also view the underlying task graph using .visualize()
:
#graph is too large
snowy_days.visualize()
Use .compute
wisely!
.persist
or caching
Sometimes you might want your computers to keep intermediate results in memory, if it fits in the memory.
The .persist()
method can be used to “cache” data and tell Dask what results to keep around. You should only use .persist()
with any data or computation that fits in memory.
For example, if we want to only do analysis on a subset of data (for example snow days at Boulder site):
boulder_snow = ddf[(ddf['SNOW']>0)&(ddf['ID']=='USC00050848')]
%%time
tmax = boulder_snow.TMAX.mean().compute()
tmin = boulder_snow.TMIN.mean().compute()
print (tmin, tmax)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask/backends.py:140, in CreationDispatch.register_inplace.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
139 try:
--> 140 return func(*args, **kwargs)
141 except Exception as e:
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask/dataframe/io/csv.py:771, in make_reader.<locals>.read(urlpath, blocksize, lineterminator, compression, sample, sample_rows, enforce, assume_missing, storage_options, include_path_column, **kwargs)
758 def read(
759 urlpath,
760 blocksize="default",
(...)
769 **kwargs,
770 ):
--> 771 return read_pandas(
772 reader,
773 urlpath,
774 blocksize=blocksize,
775 lineterminator=lineterminator,
776 compression=compression,
777 sample=sample,
778 sample_rows=sample_rows,
779 enforce=enforce,
780 assume_missing=assume_missing,
781 storage_options=storage_options,
782 include_path_column=include_path_column,
783 **kwargs,
784 )
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask/dataframe/io/csv.py:640, in read_pandas(reader, urlpath, blocksize, lineterminator, compression, sample, sample_rows, enforce, assume_missing, storage_options, include_path_column, **kwargs)
639 try:
--> 640 head = reader(BytesIO(b_sample), nrows=sample_rows, **head_kwargs)
641 except pd.errors.ParserError as e:
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
1024 kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/pandas/io/parsers/readers.py:620, in _read(filepath_or_buffer, kwds)
619 # Create the parser.
--> 620 parser = TextFileReader(filepath_or_buffer, **kwds)
622 if chunksize or iterator:
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1620, in TextFileReader.__init__(self, f, engine, **kwds)
1619 self.handles: IOHandles | None = None
-> 1620 self._engine = self._make_engine(f, self.engine)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1898, in TextFileReader._make_engine(self, f, engine)
1897 try:
-> 1898 return mapping[engine](f, **self.options)
1899 except Exception:
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py:161, in CParserWrapper.__init__(self, src, **kwds)
160 # error: Cannot determine type of 'names'
--> 161 self._validate_parse_dates_presence(self.names) # type: ignore[has-type]
162 self._set_noconvert_columns()
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/pandas/io/parsers/base_parser.py:243, in ParserBase._validate_parse_dates_presence(self, columns)
242 if missing_cols:
--> 243 raise ValueError(
244 f"Missing column provided to 'parse_dates': '{missing_cols}'"
245 )
246 # Convert positions to actual column names
ValueError: Missing column provided to 'parse_dates': 'DATE'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
File <timed exec>:1
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_collection.py:475, in FrameBase.compute(self, fuse, **kwargs)
473 if not isinstance(out, Scalar):
474 out = out.repartition(npartitions=1)
--> 475 out = out.optimize(fuse=fuse)
476 return DaskMethodsMixin.compute(out, **kwargs)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_collection.py:590, in FrameBase.optimize(self, fuse)
572 def optimize(self, fuse: bool = True):
573 """Optimizes the DataFrame.
574
575 Runs the optimizer with all steps over the DataFrame and wraps the result in a
(...)
588 The optimized Dask Dataframe
589 """
--> 590 return new_collection(self.expr.optimize(fuse=fuse))
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_expr.py:94, in Expr.optimize(self, **kwargs)
93 def optimize(self, **kwargs):
---> 94 return optimize(self, **kwargs)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_expr.py:3032, in optimize(expr, fuse)
3011 """High level query optimization
3012
3013 This leverages three optimization passes:
(...)
3028 optimize_blockwise_fusion
3029 """
3030 stage: core.OptimizerStage = "fused" if fuse else "simplified-physical"
-> 3032 return optimize_until(expr, stage)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_expr.py:2983, in optimize_until(expr, stage)
2980 return result
2982 # Simplify
-> 2983 expr = result.simplify()
2984 if stage == "simplified-logical":
2985 return expr
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_core.py:371, in Expr.simplify(self)
369 while True:
370 dependents = collect_dependents(expr)
--> 371 new = expr.simplify_once(dependents=dependents, simplified={})
372 if new._name == expr._name:
373 break
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_core.py:349, in Expr.simplify_once(self, dependents, simplified)
346 if isinstance(operand, Expr):
347 # Bandaid for now, waiting for Singleton
348 dependents[operand._name].append(weakref.ref(expr))
--> 349 new = operand.simplify_once(
350 dependents=dependents, simplified=simplified
351 )
352 simplified[operand._name] = new
353 if new._name != operand._name:
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_core.py:349, in Expr.simplify_once(self, dependents, simplified)
346 if isinstance(operand, Expr):
347 # Bandaid for now, waiting for Singleton
348 dependents[operand._name].append(weakref.ref(expr))
--> 349 new = operand.simplify_once(
350 dependents=dependents, simplified=simplified
351 )
352 simplified[operand._name] = new
353 if new._name != operand._name:
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/_core.py:332, in Expr.simplify_once(self, dependents, simplified)
330 # Allow children to simplify their parents
331 for child in expr.dependencies():
--> 332 out = child._simplify_up(expr, dependents)
333 if out is None:
334 out = expr
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/io/csv.py:102, in ReadCSV._simplify_up(self, parent, dependents)
98 if kwargs.get("usecols", None) is not None and isinstance(
99 kwargs.get("usecols")[0], int
100 ):
101 return
--> 102 return super()._simplify_up(parent, dependents)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/io/io.py:76, in BlockwiseIO._simplify_up(self, parent, dependents)
72 def _simplify_up(self, parent, dependents):
73 if (
74 self._absorb_projections
75 and isinstance(parent, Projection)
---> 76 and is_dataframe_like(self._meta)
77 ):
78 # Column projection
79 parent_columns = parent.operand("columns")
80 proposed_columns = determine_column_projection(self, parent, dependents)
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/functools.py:981, in cached_property.__get__(self, instance, owner)
979 val = cache.get(self.attrname, _NOT_FOUND)
980 if val is _NOT_FOUND:
--> 981 val = self.func(instance)
982 try:
983 cache[self.attrname] = val
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/io/csv.py:92, in ReadCSV._meta(self)
90 @functools.cached_property
91 def _meta(self):
---> 92 return self._ddf._meta
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/functools.py:981, in cached_property.__get__(self, instance, owner)
979 val = cache.get(self.attrname, _NOT_FOUND)
980 if val is _NOT_FOUND:
--> 981 val = self.func(instance)
982 try:
983 cache[self.attrname] = val
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask_expr/io/csv.py:82, in ReadCSV._ddf(self)
79 elif usecols:
80 columns = usecols
---> 82 return self.operation(
83 self.filename,
84 usecols=columns,
85 header=self.header,
86 storage_options=self.storage_options,
87 **kwargs,
88 )
File ~/miniconda3/envs/dask-cookbook/lib/python3.10/site-packages/dask/backends.py:142, in CreationDispatch.register_inplace.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
140 return func(*args, **kwargs)
141 except Exception as e:
--> 142 raise type(e)(
143 f"An error occurred while calling the {funcname(func)} "
144 f"method registered to the {self.backend} backend.\n"
145 f"Original Message: {e}"
146 ) from e
ValueError: An error occurred while calling the read_csv method registered to the pandas backend.
Original Message: Missing column provided to 'parse_dates': 'DATE'
boulder_snow = ddf[(ddf['SNOW']>0)&(ddf['ID']=='USC00050848')].persist()
%%time
tmax = boulder_snow.TMAX.mean().compute()
tmin = boulder_snow.TMIN.mean().compute()
print (tmin, tmax)
-74.82074711099168 37.419103836866114
CPU times: user 674 ms, sys: 58.9 ms, total: 733 ms
Wall time: 4.78 s
As you can see the analysis on this persisted data is much faster because we are not repeating the loading and selecting.
Dask DataFrames Best Practices
Use pandas (when you can)
For data that fits into RAM, pandas can often be easier and more efficient to use than Dask DataFrame. However, Dask DataFrame is a powerful tool for larger-than-memory datasets.
When the data is still larger than memory, Dask DataFrame can be used to reduce the larger datasets to a manageable level that pandas can handle. Next, use pandas at that point.
Avoid Full-Data Shuffling
Some operations are more expensive to compute in a parallel setting than if they are in-memory on a single machine (for example, set_index
or merge
). In particular, shuffling operations that rearrange data can become very communication intensive.
pandas performance tips
pandas performance tips such as using vectorized operations also apply to Dask DataFrames. See Modern Pandas notebook for more tips on better performance with pandas.
Check Partition Size
Similar to chunks, partitions should be small enough that they fit in the memory, but large enough to avoid that the communication overhead.
blocksize
The number of partitions can be set using the
blocksize
argument. If none is given, the number of partitions/blocksize is calculated depending on the available memory and the number of cores on a machine up to a max of 64 MB. As we increase the blocksize, the number of partitions (calculated by Dask) will decrease. This is especially important when reading one large csv file.
As a good rule of thumb, you should aim for partitions that have around 100MB of data each.
Smart use of .compute()
Try avoiding running .compute()
operation as long as possible. Dask works best when users avoid computation until results are needed. The .compute()
command informs Dask to trigger computations on the Dask DataFrame.
As shown in the above example, the intermediate results can also be shared by calling .compute()
only once.
Close your local Dask Cluster
It is always a good practice to close the Dask cluster you created.
client.shutdown()
Summary
In this notebook, we have learned about:
Dask DataFrame concept and component.
When to use and when to avoid Dask DataFrames?
How to use Dask DataFrame?
Some best practices around Dask DataFrames.
Resources and references
Reference
Ask for help
dask
tag on Stack Overflow, for usage questionsgithub discussions: dask for general, non-bug, discussion, and usage questions
github issues: dask for bug reports and feature requests