GeoTIFF

Generating virutal datasets from GeoTiff files

Overview

In this tutorial we will cover:

How to generate virtual datasets from GeoTIFFs.
Combining virtual datasets.

Prerequisites

Concepts	Importance	Notes
Basics of virtual Zarr stores	Required	Core
Multi-file virtual datasets with VirtualiZarr	Required	Core
Parallel virtual dataset creation with VirtualiZarr, Kerchunk, and Dask	Required	Core
Introduction to Xarray	Required	IO/Visualization

Time to learn: 30 minutes

Create Input File List

Here we are using fsspec's glob functionality along with the * wildcard operator and some string slicing to grab a list of GeoTIFF files from a s3 fsspec filesystem.

# Initiate fsspec filesystems for reading
fs_read = fsspec.filesystem("s3", anon=True, skip_instance_cache=True)

files_paths = fs_read.glob(
    "s3://fmi-opendata-radar-geotiff/2023/01/01/FIN-ACRR-3067-1KM/*24H-3067-1KM.tif"
)
# Here we prepend the prefix 's3://', which points to AWS.
files_paths = sorted(["s3://" + f for f in files_paths])

Start a Dask Client

To parallelize the creation of our reference files, we will use Dask. For a detailed guide on how to use Dask and Kerchunk, see the Foundations notebook: Kerchunk and Dask.

client = Client(n_workers=8, silence_logs=logging.ERROR)
client

Client

Client-d68bb06f-b0ee-11ef-8cdf-7c1e5222ecf8

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

LocalCluster

529afdab

Dashboard: http://127.0.0.1:8787/status	Workers: 8
Total threads: 8	Total memory: 15.61 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-38004c7e-36af-4c3a-90fa-8cdb7356d0ca

Comm: tcp://127.0.0.1:43443	Workers: 8
Dashboard: http://127.0.0.1:8787/status	Total threads: 8
Started: Just now	Total memory: 15.61 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:38243	Total threads: 1
Dashboard: http://127.0.0.1:40241/status	Memory: 1.95 GiB
Nanny: tcp://127.0.0.1:45345
Local directory: /tmp/dask-scratch-space/worker-0tnublw0

Worker: 1

Comm: tcp://127.0.0.1:36741	Total threads: 1
Dashboard: http://127.0.0.1:37629/status	Memory: 1.95 GiB
Nanny: tcp://127.0.0.1:34977
Local directory: /tmp/dask-scratch-space/worker-ahz0gtbr

Worker: 2

Comm: tcp://127.0.0.1:45587	Total threads: 1
Dashboard: http://127.0.0.1:34865/status	Memory: 1.95 GiB
Nanny: tcp://127.0.0.1:35621
Local directory: /tmp/dask-scratch-space/worker-imqszvku

Worker: 3

Comm: tcp://127.0.0.1:43483	Total threads: 1
Dashboard: http://127.0.0.1:46287/status	Memory: 1.95 GiB
Nanny: tcp://127.0.0.1:34117
Local directory: /tmp/dask-scratch-space/worker-jqoo9bst

Worker: 4

Comm: tcp://127.0.0.1:38623	Total threads: 1
Dashboard: http://127.0.0.1:34921/status	Memory: 1.95 GiB
Nanny: tcp://127.0.0.1:34439
Local directory: /tmp/dask-scratch-space/worker-9nn4gnve

Worker: 5

Comm: tcp://127.0.0.1:33911	Total threads: 1
Dashboard: http://127.0.0.1:43329/status	Memory: 1.95 GiB
Nanny: tcp://127.0.0.1:40101
Local directory: /tmp/dask-scratch-space/worker-rim13nh2

Worker: 6

Comm: tcp://127.0.0.1:43991	Total threads: 1
Dashboard: http://127.0.0.1:41865/status	Memory: 1.95 GiB
Nanny: tcp://127.0.0.1:37671
Local directory: /tmp/dask-scratch-space/worker-ocy8wg1v

Worker: 7

Comm: tcp://127.0.0.1:33143	Total threads: 1
Dashboard: http://127.0.0.1:33439/status	Memory: 1.95 GiB
Nanny: tcp://127.0.0.1:37755
Local directory: /tmp/dask-scratch-space/worker-_q1jxtoe

def generate_virtual_dataset(file):
    storage_options = dict(
        anon=True, default_fill_cache=False, default_cache_type="none"
    )
    vds = open_virtual_dataset(
        file,
        indexes={},
        filetype="tiff",
        reader_options={
            "remote_options": {"anon": True},
            "storage_options": storage_options,
        },
    )
    # Pre-process virtual datasets to extract time step information from the filename
    subst = file.split("/")[-1].split(".json")[0].split("_")[0]
    time_val = datetime.strptime(subst, "%Y%m%d%H%M")
    vds = vds.expand_dims(dim={"time": [time_val]})
    # Only include the raw data, not the overviews
    vds = vds[["0"]]
    return vds

# Generate Dask Delayed objects
tasks = [dask.delayed(generate_virtual_dataset)(file) for file in files_paths]

# Start parallel processing
import warnings

warnings.filterwarnings("ignore")
virtual_datasets = dask.compute(*tasks)

/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(

/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(

/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(

/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(

/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(
/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(

/home/runner/miniconda3/envs/kerchunk-cookbook/lib/python3.10/site-packages/virtualizarr/readers/tiff.py:41: UserWarning: storage_options have been dropped from reader_options as they are not supported by kerchunk.tiff.tiff_to_zarr
  warnings.warn(

Shut down the Dask cluster

client.shutdown()