Learn business growth with Google Analytics 4 Forums Google Analytics 4 Google Analytics 4 and BigQuery: How to Get Accurate Event Counts by Event Name

  • Google Analytics 4 and BigQuery: How to Get Accurate Event Counts by Event Name

    Posted by Dakota on 11 June 2022 at 1:21 pm

    I’m on a mission to extract total event count data from GA4 using BigQuery for different web platforms on a subdomain level. All was good until I started to include device.web_info.hostname in the pull – that’s when the count turned weird.

    Here’s what I tried out:

    `sh
    select
    event_date
    , device.web_info.hostname
    , platform
    , count(distinct case when event_name = ‘job_search__search’ then concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = ‘ga_session_id’), event_timestamp) end) as event_count_via_concat
    , countif(event_name = ‘job_search__search’) as event_count_via_countif
    , count(*) as event_count_naive
    from xxx.xxx.events_itable
    where
    lower(platform) = ‘web’
    and device.web_info.hostname in (‘au.xx.com’, ‘nz.xx.com’, ‘my.xx.com’)
    and event_name in (‘job_search__search’)
    group by 1,2,3
    `

    I’m seeing some strange differences where my data doesn’t match up with GA4’s total events report. Any idea what’s up?

    I’ve included screenshots from BigQuery and GA4 (links given), in case those help. Appreciate the help, good folks!

    Isabella replied 1 year ago 3 Members · 2 Replies
  • 2 Replies
  • Levi

    Member
    12 June 2023 at 10:14 pm

    It appears like some inconsistencies between your BigQuery results and GA4 report could be due to certain events getting filtered or deduplicated in GA4 or possibly a data processing delay in GA4. Without the screenshots or any specific error messages, it’s a bit harder to say why exactly it’s happening. Trying to re-check the query logic, make sure the data range is the same, and device.web_info.hostname is defined correctly could help.

  • Isabella

    Member
    6 July 2023 at 10:24 am

    There could be a few reasons for discrepancies between your BigQuery data and GA4’s total events report data. One such reason could be due to data latency. BigQuery typically takes 24-48 hours for normal data availability, and GA4 data could be bit more real-time. If your timeframe is within this latency period, you could be looking at different sets of data. Similarly, data processing in BigQuery might face some delay causing event_count_via_concat, event_count_via_countif, and event_count_naive to be vastly different.

    Another reason could be the fact that GA4 also has some data processing restrictions like processing thresholds, which could be leading to exclusion of some hits from your dataset. There might be certain events which are not being recorded and hence, the mismatch.

    Your usage of count(distinct) could also cause issues if you have very large datasets, as BigQuery approximates the results for large numbers of distinct values. Also, since you’re pre-filtering the data with the WHERE clause, the GA4 might be including other events than ‘job_search__search’ in its total count.

    Also, if you’re filtering specific hostnames, please ensure those are not filtered in GA4 as well. If the filter of hostname is done at the view level in GA4, there might be further slight discrepancies.

    And lastly, it might be worth checking the device.web_info.hostname itself. If there are problematic and inconsistent entries, this could skew the data.

Log in to reply.