Data Integration Guide

When ingesting Allium blockchain data into your own warehouse (e.g., as a dbt source), a key challenge is knowing what data was updated and when. Blockchain data is not strictly append-only — late-arriving records, reorgs, and backfills mean that records with older block_timestamp values can be delivered after newer ones. Allium provides delivery metadata tables that give you full transparency into every data update, enabling you to build a reliable and cost-efficient sync pipeline.

Key Concepts

Term	Definition
`block_timestamp`	The on-chain timestamp of a record — when the event happened on the blockchain.
Snapshot	A versioned point-in-time view of a table. Each snapshot may contain new records, backfilled records, or both. Snapshots are created roughly every hour.
Interval	An hourly slice of `block_timestamp` (e.g., all records where `block_timestamp` falls within `2026-01-12 05:00`(inclusive) to `2026-01-12 06:00`(exclusive)).

block_timestamp does not always increase monotonically with delivery time. A record delivered today could have a block_timestamp from last week (due to a backfill or patch). The metadata tables make this visible so your pipeline can handle it correctly.

Metadata Tables

`delivery_metadata.snapshots.aggregated`

Shows the latest state of each snapshot delivered, including what data intervals were loaded in each snapshot. This is the primary table for steady-state syncing.

Column	Type	Description
`chain`	`VARCHAR`	Blockchain name (e.g., `ethereum`, `base`)
`table_name`	`VARCHAR`	Table name (e.g., `raw.logs`, `raw.transactions`)
`watermark_column`	`VARCHAR`	The timestamp column used for watermarking (typically `block_timestamp`)
`watermark_level`	`TIMESTAMP_NTZ`	The high-water mark of the table at this snapshot
`delivery_interval`	`VARCHAR`	The delivery cadence for this table
`snapshot_id`	`VARCHAR`	Unique snapshot version identifier
`is_full_refresh`	`BOOLEAN`	Whether the dataset was fully refreshed during this snapshot
`snapshot_loaded_intervals__count`	`NUMBER`	Number of records loaded in this snapshot
`snapshot_loaded_intervals__minutes`	`FLOAT`	Total minutes of `block_timestamp` coverage loaded
`snapshot_loaded_intervals__min_timestamp`	`TIMESTAMP_NTZ`	Earliest `block_timestamp` of records loaded in this snapshot
`snapshot_loaded_intervals__max_timestamp`	`TIMESTAMP_NTZ`	Latest `block_timestamp` of records loaded in this snapshot
`snapshot_loaded_intervals__merged_intervals`	`VARCHAR`	Consolidated time ranges of loaded data
`snapshot_loaded_intervals__loads`	`VARCHAR`	JSON array of individual data loads within the snapshot, with exact time ranges
`snapshot_created_at`	`TIMESTAMP_NTZ`	When this snapshot was created and became available to you
`created_at`	`TIMESTAMP_NTZ`	When this metadata record was created

Each row represents a single snapshot. The min_timestamp and max_timestamp tell you the overall time range of data that was loaded. Note that this can be a sparse interval — the loads and merged_intervals fields provide the exact sub-ranges if you need finer granularity.

`delivery_metadata.intervals.changes`

Shows, for each hourly partition of data, when it was last updated. This is the primary table for backfill detection.

Column	Type	Description
`chain`	`VARCHAR`	Blockchain name
`table_name`	`VARCHAR`	Table name
`hour`	`TIMESTAMP_NTZ`	The hourly partition (`date_trunc('hour', block_timestamp)`). One row = one hour of data where `block_timestamp >= hour AND block_timestamp < hour + 1 hour`.
`last_updated_at`	`TIMESTAMP_NTZ`	The most recent time this hour’s data was modified
`updated_timestamps`	`VARCHAR`	List of timestamps when this hour’s data was updated
`last_full_refresh_timestamp`	`TIMESTAMP_NTZ`	The last time this table was fully refreshed
`max_hours_late`	`FLOAT`	Maximum hours between `hour` and when the data was actually delivered — useful for measuring backfill lag
`_created_at`	`TIMESTAMP_NTZ`	When this metadata record was created
`_updated_at`	`TIMESTAMP_NTZ`	When this metadata record was last updated

For example, if base.raw.logs has hour = 2025-11-06 02:00 with last_updated_at = 2026-02-20, that means the hourly partition for Nov 6 2AM was last updated (e.g., patched/backfilled) on Feb 20.

Recommended Integration Pattern

For full data alignment with Allium, you need two jobs:

A steady-state job (high cadence) — handles new data at the tip of chain
A backfill job (low cadence) — catches late-arriving historical patches

Job 1: Steady-State Sync

Runs every hour (or at whatever cadence your pipeline operates). Picks up newly delivered data snapshots. Query delivery_metadata.snapshots.aggregated to find snapshots created since your last run:

SELECT
    snapshot_id,
    snapshot_created_at,
    snapshot_loaded_intervals__min_timestamp,
    snapshot_loaded_intervals__max_timestamp,
    snapshot_loaded_intervals__loads,
    is_full_refresh
FROM delivery_metadata.snapshots.aggregated
WHERE chain = 'base'
  AND table_name = 'raw.logs'
  AND snapshot_created_at >= <last_pipeline_run_time> - INTERVAL '1 hour'

Then use the returned min_timestamp / max_timestamp to determine the block_timestamp range to pull from the source table.

To keep scan costs predictable, cap the minimum time to a lookback window (e.g., only pull data from the last 24–36 hours). Any data changes older than that window will be caught by the backfill job.

Job 2: Backfill Sync

Runs daily or weekly. Catches any data that was patched or backfilled for historical time periods — records too old for the steady-state lookback window to catch. Query delivery_metadata.intervals.changes to find hourly partitions that were recently updated:

SELECT
    table_name,
    hour,
    last_updated_at,
    last_full_refresh_timestamp
FROM delivery_metadata.intervals.changes
WHERE table_name = 'base.raw.logs'
  AND last_updated_at >= CURRENT_TIMESTAMP - INTERVAL '1 day'
ORDER BY hour

The returned hour values are the partitions you need to re-sync. Compare last_updated_at against your own internal tracking timestamp to determine which hours actually need patching. You can also check last_full_refresh_timestamp to detect if a full table refresh occurred — compare it against your internal timestamp to know if a full re-ingestion is needed.

How the Two Jobs Work Together

Steady-state optimizes for speed and cost — it handles the common case of new data arriving at the leading edge. Backfill ensures full correctness by catching the uncommon case of late-arriving historical patches (data from weeks or months ago being corrected). Together, these two jobs ensure your warehouse stays fully aligned with Allium.

dbt Integration Example

If you use dbt with Allium as a source, here’s how you might structure your incremental model:

-- models/staging/stg_base_raw_logs.sql
{{ config(
    materialized='incremental',
    unique_key=['block_number', 'transaction_index', 'log_index'],
    incremental_strategy='merge'
) }}

SELECT *
FROM {{ source('allium', 'base_raw_logs') }}

{% if is_incremental() %}
WHERE block_timestamp >= (
    SELECT MIN(snapshot_loaded_intervals__min_timestamp)
    FROM {{ source('allium_delivery_metadata', 'snapshots_aggregated') }}
    WHERE chain = 'base'
      AND table_name = 'base.raw.logs'
      AND snapshot_created_at >= (
          SELECT MAX(_loaded_at) FROM {{ this }}
      ) - INTERVAL '1 hour'
)
{% endif %}

For the backfill job, create a separate model or macro that queries intervals.changes and re-processes the affected partitions.

Overview

Data Verticals

Solana

Hyperliquid

Lighter

Bitcoin Ecosystem

EVM

Cosmos Ecosystem

Move Ecosystem

Other Ecosystems

Key Concepts

Metadata Tables

`delivery_metadata.snapshots.aggregated`

`delivery_metadata.intervals.changes`

Recommended Integration Pattern

Job 1: Steady-State Sync

Job 2: Backfill Sync

How the Two Jobs Work Together

dbt Integration Example

Further Reading

Overview

Data Verticals

Solana

Hyperliquid

Lighter

Bitcoin Ecosystem

EVM

Cosmos Ecosystem

Move Ecosystem

Other Ecosystems

​Key Concepts

​Metadata Tables

​delivery_metadata.snapshots.aggregated

​delivery_metadata.intervals.changes

​Recommended Integration Pattern

​Job 1: Steady-State Sync

​Job 2: Backfill Sync

​How the Two Jobs Work Together

​dbt Integration Example

​Further Reading

Key Concepts

Metadata Tables

`delivery_metadata.snapshots.aggregated`

`delivery_metadata.intervals.changes`

Recommended Integration Pattern

Job 1: Steady-State Sync

Job 2: Backfill Sync

How the Two Jobs Work Together

dbt Integration Example

Further Reading