Data Availability Scanner

The scan_data_availability function helps you discover when telemetry data exists in your database. It's especially useful in Jupyter notebooks where it provides an interactive, collapsible view of data windows organized by month and day.

Quick Start

import slicks
from datetime import datetime

# Configure connection first
slicks.connect_influxdb3(
    url="http://your-server:9000",
    token="your-token",
    db="WFR25"
)

# Scan for data availability
result = slicks.scan_data_availability(
    start=datetime(2025, 1, 1),
    end=datetime(2026, 1, 1),
    timezone="America/Toronto"
)

# Display interactive view (just type the variable name in Jupyter)
result

Interactive Views

Collapsible Tree View

When you display result in Jupyter, you get a nested, collapsible view:

Months are shown expanded by default
Days can be clicked to expand/collapse
Time windows show start/end times and row counts

Inline Calendar View

Calendar Heatmap

Visualize data density across the year with a GitHub-style calendar:

result.calendar_view()

This creates a 12-month grid where darker green means more data was recorded that day.

Calendar Heatmap

Function Reference

`slicks.scan_data_availability`

slicks.scan_data_availability(
    start: datetime,
    end: datetime,
    timezone: str = "UTC",
    table: str = None,
    bin_size: str = "hour",
    include_counts: bool = True,
    show_progress: bool = True
) -> ScanResult

Parameters:

Parameter	Type	Default	Description
`start`	`datetime`	required	Start of scan range (UTC or timezone-aware)
`end`	`datetime`	required	End of scan range
`timezone`	`str`	`"UTC"`	Timezone for display (e.g., `"America/Toronto"`)
`table`	`str`	`None`	Table to scan (defaults to `"iox.{INFLUX_DB}"`)
`bin_size`	`str`	`"hour"`	Granularity: `"hour"` or `"day"`
`include_counts`	`bool`	`True`	Include row counts (slightly slower if `True`)
`show_progress`	`bool`	`True`	Show progress bar during scan

Returns: ScanResult object with interactive display

ScanResult Methods

`.to_dict()`

Export as nested dictionary:

data = result.to_dict()
# {'2025-01-15': [{'start_utc': '...', 'end_utc': '...', 'row_count': 1500}, ...], ...}

`.to_dataframe()`

Flatten to pandas DataFrame:

df = result.to_dataframe()

date	start_utc	end_utc	start_local	end_local	row_count	duration_hours
2025-01-15	2025-01-15T14:00:00+00:00	2025-01-15T16:00:00+00:00	...	...	1500	2.0

`.calendar_view(year=None)`

Generate a heatmap calendar:

fig = result.calendar_view()  # Auto-detects year from data
fig = result.calendar_view(year=2025)  # Specific year

Properties

result.days - List of dates with data (e.g., ['2025-01-15', '2025-01-16', ...])
result.total_rows - Total row count across all windows
len(result) - Number of days with data

Performance Tips

For Large Date Ranges

Use day-level granularity for faster scans:

result = slicks.scan_data_availability(
    start=datetime(2025, 1, 1),
    end=datetime(2026, 1, 1),
    bin_size="day"  # Much faster than "hour"
)

Skip Row Counts

If you only need to know when data exists (not how much):

result = slicks.scan_data_availability(
    start=datetime(2025, 1, 1),
    end=datetime(2025, 2, 1),
    include_counts=False  # Faster
)

Scan Smaller Ranges

For dense months, scan one month at a time:

result = slicks.scan_data_availability(
    start=datetime(2025, 6, 1),
    end=datetime(2025, 7, 1),
    timezone="America/Toronto"
)

Terminal Usage

The scanner also works in regular Python scripts with a text-based display:

result = slicks.scan_data_availability(...)
print(result)

Output:

Data Availability (America/Toronto)
========================================

📆 January 2025 (3 days, 4,500 rows)
   📅 Day 15 (2 windows, 1,500 rows)
      └─ 09:00 → 11:00 (1,200 rows)
      └─ 14:30 → 16:00 (300 rows)
   📅 Day 16 (1 window, 3,000 rows)
      └─ 10:00 → 14:00 (3,000 rows)

========================================
Total: 3 days, 4,500 rows