Skip to main content
When ingesting Allium blockchain data into your own warehouse (e.g., as a dbt source), a key challenge is knowing what data was updated and when. Blockchain data is not strictly append-only — late-arriving records, reorgs, and backfills mean that records with older block_timestamp values can be delivered after newer ones. Allium provides delivery metadata tables that give you full transparency into every data update, enabling you to build a reliable and cost-efficient sync pipeline.

Key Concepts

TermDefinition
block_timestampThe on-chain timestamp of a record — when the event happened on the blockchain.
SnapshotA versioned point-in-time view of a table. Each snapshot may contain new records, backfilled records, or both. Snapshots are created roughly every hour.
IntervalAn hourly slice of block_timestamp (e.g., all records where block_timestamp falls within 2026-01-12 05:00(inclusive) to 2026-01-12 06:00(exclusive)).
block_timestamp does not always increase monotonically with delivery time. A record delivered today could have a block_timestamp from last week (due to a backfill or patch). The metadata tables make this visible so your pipeline can handle it correctly.

Metadata Tables

delivery_metadata.snapshots.aggregated

Shows the latest state of each snapshot delivered, including what data intervals were loaded in each snapshot. This is the primary table for steady-state syncing.
ColumnTypeDescription
chainVARCHARBlockchain name (e.g., ethereum, base)
table_nameVARCHARTable name (e.g., raw.logs, raw.transactions)
watermark_columnVARCHARThe timestamp column used for watermarking (typically block_timestamp)
watermark_levelTIMESTAMP_NTZThe high-water mark of the table at this snapshot
delivery_intervalVARCHARThe delivery cadence for this table
snapshot_idVARCHARUnique snapshot version identifier
is_full_refreshBOOLEANWhether the dataset was fully refreshed during this snapshot
snapshot_loaded_intervals__countNUMBERNumber of records loaded in this snapshot
snapshot_loaded_intervals__minutesFLOATTotal minutes of block_timestamp coverage loaded
snapshot_loaded_intervals__min_timestampTIMESTAMP_NTZEarliest block_timestamp of records loaded in this snapshot
snapshot_loaded_intervals__max_timestampTIMESTAMP_NTZLatest block_timestamp of records loaded in this snapshot
snapshot_loaded_intervals__merged_intervalsVARCHARConsolidated time ranges of loaded data
snapshot_loaded_intervals__loadsVARCHARJSON array of individual data loads within the snapshot, with exact time ranges
snapshot_created_atTIMESTAMP_NTZWhen this snapshot was created and became available to you
created_atTIMESTAMP_NTZWhen this metadata record was created
Each row represents a single snapshot. The min_timestamp and max_timestamp tell you the overall time range of data that was loaded. Note that this can be a sparse interval — the loads and merged_intervals fields provide the exact sub-ranges if you need finer granularity.

delivery_metadata.intervals.changes

Shows, for each hourly partition of data, when it was last updated. This is the primary table for backfill detection.
ColumnTypeDescription
chainVARCHARBlockchain name
table_nameVARCHARTable name
hourTIMESTAMP_NTZThe hourly partition (date_trunc('hour', block_timestamp)). One row = one hour of data where block_timestamp >= hour AND block_timestamp < hour + 1 hour.
last_updated_atTIMESTAMP_NTZThe most recent time this hour’s data was modified
updated_timestampsVARCHARList of timestamps when this hour’s data was updated
last_full_refresh_timestampTIMESTAMP_NTZThe last time this table was fully refreshed
max_hours_lateFLOATMaximum hours between hour and when the data was actually delivered — useful for measuring backfill lag
_created_atTIMESTAMP_NTZWhen this metadata record was created
_updated_atTIMESTAMP_NTZWhen this metadata record was last updated
For example, if base.raw.logs has hour = 2025-11-06 02:00 with last_updated_at = 2026-02-20, that means the hourly partition for Nov 6 2AM was last updated (e.g., patched/backfilled) on Feb 20. For full data alignment with Allium, you need two jobs:
  1. A steady-state job (high cadence) — handles new data at the tip of chain
  2. A backfill job (low cadence) — catches late-arriving historical patches

Job 1: Steady-State Sync

Runs every hour (or at whatever cadence your pipeline operates). Picks up newly delivered data snapshots. Query delivery_metadata.snapshots.aggregated to find snapshots created since your last run:
SELECT
    snapshot_id,
    snapshot_created_at,
    snapshot_loaded_intervals__min_timestamp,
    snapshot_loaded_intervals__max_timestamp,
    snapshot_loaded_intervals__loads,
    is_full_refresh
FROM delivery_metadata.snapshots.aggregated
WHERE chain = 'base'
  AND table_name = 'raw.logs'
  AND snapshot_created_at >= <last_pipeline_run_time> - INTERVAL '1 hour'
Then use the returned min_timestamp / max_timestamp to determine the block_timestamp range to pull from the source table.
To keep scan costs predictable, cap the minimum time to a lookback window (e.g., only pull data from the last 24–36 hours). Any data changes older than that window will be caught by the backfill job.

Job 2: Backfill Sync

Runs daily or weekly. Catches any data that was patched or backfilled for historical time periods — records too old for the steady-state lookback window to catch. Query delivery_metadata.intervals.changes to find hourly partitions that were recently updated:
SELECT
    table_name,
    hour,
    last_updated_at,
    last_full_refresh_timestamp
FROM delivery_metadata.intervals.changes
WHERE table_name = 'base.raw.logs'
  AND last_updated_at >= CURRENT_TIMESTAMP - INTERVAL '1 day'
ORDER BY hour
The returned hour values are the partitions you need to re-sync. Compare last_updated_at against your own internal tracking timestamp to determine which hours actually need patching. You can also check last_full_refresh_timestamp to detect if a full table refresh occurred — compare it against your internal timestamp to know if a full re-ingestion is needed.

How the Two Jobs Work Together

Steady-state optimizes for speed and cost — it handles the common case of new data arriving at the leading edge. Backfill ensures full correctness by catching the uncommon case of late-arriving historical patches (data from weeks or months ago being corrected). Together, these two jobs ensure your warehouse stays fully aligned with Allium.

dbt Integration Example

If you use dbt with Allium as a source, here’s how you might structure your incremental model:
-- models/staging/stg_base_raw_logs.sql
{{ config(
    materialized='incremental',
    unique_key=['block_number', 'transaction_index', 'log_index'],
    incremental_strategy='merge'
) }}

SELECT *
FROM {{ source('allium', 'base_raw_logs') }}

{% if is_incremental() %}
WHERE block_timestamp >= (
    SELECT MIN(snapshot_loaded_intervals__min_timestamp)
    FROM {{ source('allium_delivery_metadata', 'snapshots_aggregated') }}
    WHERE chain = 'base'
      AND table_name = 'base.raw.logs'
      AND snapshot_created_at >= (
          SELECT MAX(_loaded_at) FROM {{ this }}
      ) - INTERVAL '1 hour'
)
{% endif %}
For the backfill job, create a separate model or macro that queries intervals.changes and re-processes the affected partitions.

Further Reading