Learn business growth with Google Analytics 4 Forums Google Analytics 4 Discrepancy between GA4 traffic source data and BigQuery records

  • Discrepancy between GA4 traffic source data and BigQuery records

    Posted by Matthew on 26 September 2022 at 7:33 am

    Hey there, I need some help. I’ve been trying to match up traffic source data and event attribution data from BigQuery with GA4, using the session_source and session_medium. The thing is, when I pull the event parameters like source and medium from BigQuery, I’m noticing a rather sizeable difference between the two sets of data. Any ideas on how to crack this nut?

    I’ve been fiddling with the SQL below, but no dice so far:

    
    with prep as (
    select
        user_pseudo_id,
        (select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id,
        max((select value.string_value from unnest(event_params) where key = 'source')) as source,
        max((select value.string_value from unnest(event_params) where key = 'medium')) as medium,
        max((select value.string_value from unnest(event_params) where key = 'name')) as campaign,
        max((select value.string_value from unnest(event_params) where key = 'term')) as term,
        max((select value.string_value from unnest(event_params) where key = 'content')) as content,
        platform,
    FROM XXX 
    group by
        user_pseudo_id,
        session_id,
        platform
    )
    
    select
        -- session medium (dimension | the value of a medium associated with a session)
        platform,
        coalesce(source,'()') as source_session,
        coalesce(medium,'()') as medium_session,
        coalesce(campaign,'()') as campaign_session,
        coalesce(content,'()') as content,
        coalesce(term,'()') as term,
        count(distinct concat(user_pseudo_id,session_id)) as sessions
    from
        prep
    group by
        platform,
        source_session,
        medium_session,
        campaign_session,
        content,
        term
    order by
        sessions desc
    
    Lucas replied 12 months ago 3 Members · 2 Replies
  • 2 Replies
  • George

    Member
    20 November 2022 at 11:20 pm

    It can be a bit of a pickle when BigQuery can’t correctly match the source and medium to an event, can’t it? It appears that it defaults the source and medium as google/organic even if the gclid parameter is present in the link. Another issue I noticed is how it falls short in recognizing the source when it’s direct, resulting in cases where we don’t have these parameters for events at all.

    Just to point out, however, these values are valid for the source and medium that acquired the user. And when I cross-compared the data in Universal Analytics (UA) and Google Analytics 4 (GA4), the session attribution seems to be correct. The crux of the issue might lie in the exporting process to BigQuery. Rest assured, I’ve reached out to support and am currently waiting for a response.

  • Lucas

    Member
    29 March 2023 at 8:54 pm

    It seems that you may be experiencing discrepancies between your BigQuery and GA4 data due to the way sessions are recorded and counted, among possible other reasons. Your SQL code omits events that lack a valid pseudo_id and session_id, which might not be the case within GA4 that could include them. It might be worth examining how your code handles null values.

    There’s also a methodological difference in how you’re counting sessions compared to GA4. Your exact count might not match up with GA4’s approximation method, which employs HyperLogLog for session estimations. To lessen this discrepancy, you could consider using BigQuery HyperLogLog functions which are compatible with GA4’s analytics data as described in the Google Developers link you shared.

    However, even with these corrections, attribution might not be perfectly matched due to other factors, including discrepancies in tracking source/mediums, along with factors that could affect whether a user’s actions constitute a “session”. You might need to adjust your method of counting sessions and attributions, investigate your tracking configurations, or the possibility of inconsistencies in the data set.

Log in to reply.