Learn business growth with Google Analytics 4 Forums Google Analytics 4 Aligning BigQuery Metrics with GA4 Traffic Acquisition Analytics

  • Aligning BigQuery Metrics with GA4 Traffic Acquisition Analytics

    Posted by Ella on 3 January 2023 at 8:51 pm

    Hey folks, so here’s the scoop. I’m still new to this whole BigQuery thing and I’m trying to make this Traffic Acquisition GA4 report. Thing is, my results are nowhere near the GA4 view. I’ve gotta be doing something wrong, but here’s where I’m at:

    • I get it, in GA4 and BQ, the source/medium/campaign fields are based on events, not sessions. But why doesn’t every event have a source/medium/campaign under event_parameter_key? Makes sense to me that you’d have these in the ‘session_start’ event, but no dice.

    • Here’s what I’ve tried to recreate the Traffic Acquisition report:

    First, I aimed to check the first medium for sessions:

    with cte as ( select
        PARSE_DATE("%Y%m%d", event_date) AS Date,
        user_pseudo_id,
        concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) as session_id,
        FIRST_VALUE((select value.string_value from unnest(event_params) where key = 'medium')) OVER (PARTITION BY concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) ORDER BY event_timestamp) as first_medium
    
    FROM project)
    
    select Date, first_medium, count(distinct user_pseudo_id) as Users, count (distinct session_id) as Sessions
    from cte
    group by 1,2;
    

    Now this query gives me 44k users as ‘null’ medium and 1.8k organic users. Compare this with the 17k users with the ” medium and 8k organic users from GA4, you get the picture.

    Then, I switched it up and checked the last medium:

    FIRST_VALUE((select value.string_value from unnest(event_params) where key = 'medium')) OVER (PARTITION BY concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) ORDER BY event_timestamp desc) as last_medium
    

    I did see an increase to 9k organic users, but it’s still way off the GA4 data.

    Lastly, I tried this code – https://www.ga4bigquery.com/traffic-source-dimensions-metrics-ga4/ – which uses source / medium (based on session). As you might guess, the results made no sense compared to GA4 data either.

    If any wiz out there can lend a hand, I’d be super grateful!

    EDIT:
    Just stumbled on this gem of an article which thoroughly explains how to query traffic sources data. It’s the best one I’ve bumped into till now:

    Query the GA4 Session Last Non-direct Traffic Source in BigQuery

    Landon replied 1 year ago 3 Members · 2 Replies
  • 2 Replies
  • Gabriel

    Member
    16 February 2023 at 1:12 am

    Absolutely, I’ve been in your shoes too and understand the confusion. I took a deep dive and analyzed a single day’s worth of data from BigQuery in Google Sheets. What I noticed was that the results I got with the GA4BigQuery code matched what I got in BigQuery, but there was a stark difference with the GA4 data. This was particularly noticeable with lower traffic pages.

    Also, the counts I got for ’email’ as a source in event parameters, ea_tracking_id, and traffic_source fields were consistently lower than what GA4 reported. I tested it on my development site as well where I know the exact number of sessions that have ’email’ as a source. Although GA4 reported it correctly, BigQuery was once again off the mark.

    My conclusion? The issue doesn’t lie in the SQL queries or the tagging, but in the GA4 data source in BigQuery. It’s like Google is randomly attributing some of the traffic to ‘not set’. I’ve taken this up with Google and hoping for a solution. Apologies that I couldn’t provide an immediate fix.

  • Landon

    Member
    24 March 2023 at 1:10 am

    This user was having trouble getting accurate results in Google’s BigQuery for their GA4 report. The source/medium/campaign data for each event didn’t match the GA4 view. They tried several ways of querying the data, each time running into discrepancies. While they could find the number of ‘null’ and ‘organic’ users, these numbers were significantly different from what GA4 was showing. Eventually, they realized that BigQuery was incorrectly attributing their google / cpc sessions to google / organic. Upon further investigation, they discovered a key called ‘gclsrc’ in the event parameters that indicated whether a session came through Google Ads. By using that key, they were able to accurately identify and convert the google / cpc sessions that were wrongly recorded as organic.

Log in to reply.