Learn business growth with Google Analytics 4 Forums Google Analytics 4 Differences in reference source aggregation between GA4 sessions and exported BQ data

  • Differences in reference source aggregation between GA4 sessions and exported BQ data

    Posted by Addison on 26 May 2023 at 7:33 pm

    Hey there,

    I’ve got this problem that’s driving me nuts! You see, I run this e-commerce site, and I’ve been using GA4. I’m also exporting data from GA4 to BigQuery using their handy export function (link: https://support.google.com/analytics/answer/9358801).

    Now here’s where it gets tricky. I used GA4’s search tool to whip up a report counting how many sessions each referrer got us. Next, I went to BigQuery, crunched the GA4 data, and did the same counting for each session referrer. But when I compared the GA4 report with the BigQuery results, the session count for some referrers was way lower in the BigQuery results.

    Basically, I used BigQuery to tally the session numbers for each referrer, using the string value of the source as the key to group by. Wrote a SQL query on BigQuery for this – I can share it if that could help.

    There’s this weird thing though – depending on which referrer directed the session, the numbers can be almost identical. Like, BQ’s app showed some big differences, but Instagram was almost spot-on. And the total session counts matched up – it’s just that there were a lot of sessions with null referrers.

    So I’m scratching my head wondering if GA4 is categorizing what should be ‘app’ and others as null on BigQuery. I even tried using REGEX to pull the utm_source from the page_location myself, but the same discrepancy appeared.

    Do you have any idea why this might be happening, or how I can get around it? I could really use some advice here. Thanks in advance!

    Ava replied 1 year ago 3 Members · 2 Replies
  • 2 Replies
  • Jack

    Member
    16 June 2023 at 6:44 am

    The discrepancies you’re seeing between your GA4 report and your BigQuery results could be due to the way each platform handles ‘null’ or missing referrers. GA4 might be categorizing some sessions with missing referrers as coming from certain sources like ‘app’, whereas BigQuery could be registering these as ‘null’. There might also be differences in how each platform interprets or assigns session data. Since Instagram numbers were almost identical, it could suggest that sessions from certain referrers are more likely to be miscategorized or lost. Getting your utm_source directly from page_location can be prone to inaccuracies if the utm parameters are not consistently used or formatted correctly. To address this, you might want to double-check how you’re setting your utm parameters, and ensure you’re correctly extracting and categorizing referrer information in your BigQuery queries.

  • Ava

    Member
    17 June 2023 at 5:42 pm

    The discrepancy you are experiencing could be due to a number of reasons. One possibility is that GA4 is categorizing what should be ‘app’ and others as null on BigQuery. This issue could be arising because some referrers don’t pass the referrer information to your website and as a result, GA4 records the event as direct traffic or null referrer in BigQuery. Besides, GA4 and BigQuery handle bots and spam sessions differently and this could also result in discrepancies.

    Another important factor you need to consider is the timing of your data export from GA4 to BigQuery. If the data is exported in real-time, there could be some differences due to data latency. Therefore, it is recommended to compare data of GA4 and BigQuery for a past period, when both datasets are complete.

    If you are fetching the UTM parameter from the URL, you have to consider that the page_location variable captures the full URL of the page where the event is triggered. It is possible that not all your sessions start on a page where UTMs are present in the URL or maybe the UTMs are dropped in subsequent pages within the same session. Therefore, the actual number of sessions per campaign in GA4 can be different compared to what you get when you fetch the UTM from page_location in BigQuery.

    It’s also worth checking that the event parameters that you are using in your SQL query do indeed contain the referrer data you need. Use DebugView in Firebase to see the events and their parameters as they occur in your GA4 property, and make sure you’ve got your query parameters correct.

    Finally, remember that the way GA4 and BigQuery treat and record data is fundamentally different. GA4 preprocesses and aggregates data for you, while BigQuery provides the raw, event-level data. Discrepancies can arise simply because of these inherent differences. It’s always a good idea to thoroughly understand the data schema of both GA4 and BigQuery, and know exactly what each field contains and how to use it.

Log in to reply.