This page describes some frequently used tips and tricks to improve query performance when querying historical data.

General tips and tricks

Most time-based tables are clustered by block_timestamp::date, so it’s useful to have block_timestamp in join conditions Use CTEs to filter source data before the joins in the SQL.

Filter as much as you can in CTEs, and on time columns

The snowflake query optimization engine is generally effective at partition pruning and pushing filters down to relevant tables, and query performance generally suffice for small tables. However, there are times when it fails, or is not ‘smart’ enough especially on large tables like those in solana. Aim to assist the engine as much as you can:
  1. In snowflake, partition pruning does not work with values from subqueries. (snowflake docs)
    -- bad example, avoid doing this, does not work
    select * from table_a where block_timestamp >= (select min(timestamp) from table_b)
    
    -- good example, do this!
    select * from table_a where block_timestamp >= '2025-01-01'
    
  2. Try to restrict queries to predefined time ranges. When querying for a large time range, concurrent queries of smaller time intervals might work better.
    • e.g. when querying for a year’s data, 12 batches of 1 month each is probably faster.
  3. Filter in all preceding CTEs instead of leaving it to the joins to pushdown filters. Especially effective when filtering on timestamp column.
    -- bad example, avoid doing this
    with
    cte_a as (select * from table_a where block_timestamp between '2025-01-01' and '2025-01-02'),
    cte_b as (select * from table_b)
    select * from 
    -- query engine *might not* know to push down block_timestamp to table_b, although it should, in most cases
    cte_a join cte_b using (block_timestamp)
    
    -- good example, do this!
    with
    cte_a as (select * from table_a where block_timestamp between '2025-01-01' and '2025-01-02'),
    cte_b as (select * from table_b where block_timestamp between '2025-01-01' and '2025-01-02'
      and block_timestamp in (select block_timestamp from cte_a)
    )
    select * from 
    cte_a join cte_b using (block_timestamp)
    

Ecosystem specific tips

Solana
  1. When querying raw entities such as transactions, instructions, inner_instructions, if you do not need voting data, you can use one of the corresponding optimized views:
    • transactions
      • success_nonvoting_transactions
      • nonvoting_transactions
    • instructions
      • success_nonvoting_instructions
      • nonvoting_instructions
    • inner_instructions
      • success_nonvoting_inner_instructions
      • nonvoting_inner_instructions
    • inner_outer_instructions
      • success_nonvoting_inner_outer_instructions
      • nonvoting_inner_outer_instructions
  2. Filter out failed/success and voting/nonvoting records:
    • for transactions, use success and is_voting
    • for (inner)instructions, use parent_tx_success and is_voting
  3. For transaction meta columns, the rpc returns pre/post token/native balances. we have reformatted these cols to be more user-friendly in the columns sol_amounts, mint_to_decimals, token_accounts for your convenience. (only available for success_nonvoting_transactions). For more info, refer to transaction-level columns.
  4. For transaction fees, you can find them at solana.raw.fees, they are also available in transaction-level columns for convenience.

Dbt tips and tricks

  1. When running dbt with materialization='incremental', due to snowflake limitations around partition pruning with subquery results, we need to precompute the time value for the incremental filter
    • create a macro that queries {{this}} to get the desired timestamp. sample
    • use that desired timestamp in your dbt model, and everywhere a time filter should be.