Optimizing Queries
The general rule is that you need to reduce the number of bytes scanned in your query. not just to make it more performant but more importantly to make it cheaper. Here are some recommendations on how to do that.
Avoid
SELECT *
: BigQuery has column-based storage so the amount of columns you scan is proportional to the number of bytes you'll scan.Leverage partitions and clusters: We partition most of our tables by ìngestion_timestamp (which is the time we ingest the data into BigQuery) and cluster them by
event_type
andattribute_key
Qu. Use these columns in your where clause to make sure you scan as little data as possible. Make sure thatingestion_timestamp
is only used when testing and not in production environments. We sometimes reingest data that have aningestion_timestamp
very distant from the actualblock_timestamp
.Late aggregation: Ensure that you aggregate as late as possible. For instance, your subqueries should avoid
GROUP BY
clauses UNLESS they are drastically reduced before being joinedThe largest table first in
JOIN
clause: When joining multiple tables, make sure that you join the tables in decreasing order of size.Reduced data when joining: When joining use
WHERE
clause to eliminate all data you don't need before joining, this will make your joins much better performing. Try to have filters on all joining tables if possible.Order in
WHERE
matters: BQ assumes that you've chosen the best order for your filters. The first part of yourWHERE
clause should always contain the filter that eliminates the most amount of data.
Last updated