Querying GA4 Events data in BigQuery
To understand how events are stored in BigQuery for Google Analytics 4, we will explore how events and their parameters are stored in BigQuery. Google Analytics event parameters are recorded in nested and repeatable field called event_params. We will explore what is unnesting and why it is mandatory to filter and extract data from BigQuery. I am using Google’s sample dataset for Google Analytics 4 for this article you can access the dataset here for querying alongside the article.
Event Schema
A complete reference to the schema can be found here. I am using key fields of interest which can help us in understanding how events are represented in BigQuery for GA4 and how we can use SQL to access it.
Here is a visual view of the above schema to highlight where the nested columns are.
If we look at a single-row preview of how this data is actually stored in BigQuery, we get the following preview.
Let us go into bit more detail and reproduce the table in an excel like format to understand it .
wdt_ID | event_date | event_timestamp | event_name | event_params.key | event_params.value.string_value | event_params.value.int_value | event_params.value.float_value | event_params.value.double_value |
---|---|---|---|---|---|---|---|---|
1 | 20210131 | 2147483647 | page_view | gclid | ||||
2 | gclsrc | |||||||
3 | debug_mode | 1 | ||||||
4 | ga_session_number | 1 | ||||||
5 | all_data | |||||||
6 | page_location | https://shop.googlemerchandisestore.com/ | ||||||
7 | entrances | 1 | ||||||
8 | session_engaged | 0 | ||||||
9 | ga_session_id | 661.084.800 | ||||||
10 | clean_event | gtm.js |
The first 3 columns event_date, event_timestamp and event_name are not nested. They constitute 1 row. However, from the 4th column onwards, we see “nesting” starts. All the parameters of the event page_view are stored as key, value pairs. These two are represented differently in nested records.
- Key : event_params.key the column highlighted in the green is the key for each parameter.
- value : Value is represented using next 4 columns. In case of string value for paramter it is stored in .string_value , if the event had an integer value it will be stored in .int_value column and so on.
Limitation of Traditional SQL
Traditional SQL queries row data.There is no provision which can allow us to query nested data. In the above table, say you want to get page_location from the table, you cannot get it because 1 row has several nested rows.
SELECT *
FROM
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210130`
WHERE event_params.key='page_location'
It will throw an error saying,
“Cannot access field key on a value with type ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64, float_value FLOAT64, …”
The error above means that our query is attempting to access a field named ‘key‘ from a value with an ARRAY type, which is not allowed.
If you refer back to our schema diagram above, you will see the following :
- event_params : has a data type RECORD
- event_params.key : has a data type STRING
- event_params.value has a data type RECORD
- Individual value columns under event_params.value : have either STRING, INTEGER or FLOAT data type
You could only query the above data, if it were in the following format where each row was accessible uniquely.
wdt_ID | event_date | event_timestamp | event_name | event_params.key | event_params.value.string_value | event_params.value.int_value | event_params.value.float_value | event_params.value.double_value |
---|---|---|---|---|---|---|---|---|
1 | 20210131 | 2147483647 | page_view | gclid | ||||
2 | 20210131 | 2147483647 | page_view | gclsrc | ||||
3 | 20210131 | 2147483647 | page_view | debug_mode | 1 | |||
4 | 20210131 | 2147483647 | page_view | ga_session_number | 1 | |||
5 | 20210131 | 2147483647 | page_view | all_data | ||||
6 | 20210131 | 2147483647 | page_view | page_location | https://shop.googlemerchandisestore.com/ | |||
7 | 20210131 | 2147483647 | page_view | entrances | 1 | |||
8 | 20210131 | 2147483647 | page_view | session_engaged | 0 | |||
9 | 20210131 | 2147483647 | page_view | ga_session_id | 661.084.800 | |||
10 | 20210131 | 2147483647 | page_view | clean_event | gtm.js |
In this table, we have manually filled first 3 columns with repeated values so that we can get the data in a standard row format. This is exactly what unnesting does, it allows us to query data as if it was from a unique row. We can thenaccess any row of data and filter on it.
Unnesting the events
Let us begin with a query to recreate the table above where we have event_date, event_timestamp, and event_name repeated for each nested record. We can then decipher the query step-by-step to understand how unnesting works in BigQuery.
WITH unnested_params AS (
SELECT
event_date,
event_timestamp,
event_name,
params.key AS param_key,
params.value.string_value AS param_string_value,
params.value.int_value AS param_int_value,
params.value.double_value AS param_double_value,
params.value.float_value AS float_int_value
FROM
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210130`
CROSS JOIN
UNNEST(event_params) AS params
)
SELECT
*
FROM
unnested_params
This query above will generate the same result as our table above, where we manually unnested by pasting the repeating values in first 3 columns
Let us dissect the query and understand what is happening behind the scenes.
1. With Clause
Using with clause is optional. I am using it here to make the query easy to digest. “with” clause creates a temporary result set for our query within it. We can then use the temporary result set as our table for next select statement. In ou case, we will get the temporary result set for our query within the blue highlighted area un unnested_params variable. We can then query unnested_params using a select statement.
2. Key,value fields
Under SELECT statement, we are selecting normal columns event_date, event_timestamp,event_name. In addition to these, we are also selecting unnested keys using params.key and values from params.value RECORD.
3. CROSS JOIN
Cross join takes every row on the left side of it and joins it with every row on the right side of it. The SQL on left side of CROSS JOIN selected the parameters as discussed above and for each row on left side appended a row from the Right side. The right side rows are obtained from unnesting the event_params RECORD.
4. UNNEST
The UNNEST keyword as the name suggests Undo the nesting of the field passed to it in brackets. E.g. the UNNEST(event_params) command takes the event_params record out of the context of nested record and makes it available as a separate column on its own. When we CROSS JOIN the event names and key values with unnested record , we get the result we were after.
Getting Events with specific parameters
We can now extend the above query to do meaningful data extraction. Let us say, we want to find all page views where page location contained they keyword “Apparel”.
We can rewrite the above query with the conditions
- params.keys : Get only page_location parameter
- params.value.string_value : Get only pages where we have “Apparel” in page_location parameter key.
- event_name: Get only page_view event.
WITH unnested_params AS (
SELECT
event_date,
event_timestamp,
event_name,
params.key AS param_key,
params.value.string_value AS param_string_value,
params.value.int_value AS param_int_value,
params.value.double_value AS param_double_value,
params.value.float_value AS float_int_value
FROM
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210130`
CROSS JOIN
UNNEST(event_params) AS params
where params.key='page_location'
AND
params.value.string_value LIKE '%Apparel%'
AND
event_name='page_view'
)
SELECT
*
FROM
unnested_params
This query will now give us results filtered which were possible only after unnesting the event_params RECORD.
Summary
In this article, we explored the RECORD type fields in BigQuery in the context of Google Analytics 4 data set. BigQuery RECORD type fields are used for nested records. Google Analytics 4 events parameters are stored as key, value pairs in event_params field of RECORD type. To filter on the basis of either keys or values we have to un-nest the field. This can be done with the help of “UNNEST” keyword and using a CROSS JOIN with the fields that we want to combine the data with.
Responses