Learn business growth with Google Analytics 4 Forums Google Analytics 4 The Discrepancy Between BigQuery and Google Analytics 4 for Page Views: Seeking Explanations

  • The Discrepancy Between BigQuery and Google Analytics 4 for Page Views: Seeking Explanations

    Posted by Vijay on 24 February 2023 at 8:54 pm

    “Hey folks, ran into a weird hiccup with my Google Analytics 4 (GA4) events table in Big Query. I’m using it to keep track of multiple websites, and querying for total pageviews per site. My problem? The numbers for pageviews reported for some sites are too high by hundreds on Big Query compared to GA4.

    It’s odd because:
    – Other events like sessions don’t show differences
    – The discrepancy only happens with some sites, not all of ’em

    I’m stumped as to why this is happening. I know GA4 uses unprocessed data and the way I query data differs from the GA4 interface processing, hence minor differences. But a few hundred off seems too much.

    I’ve tried looking at GA4 documentation and even debugging each site to make sure that the tags are firing correctly. Unfortunately, none of that shed any light on my problem.

    Is there something I’m missing or a new approach I can try? Would love to hear your thoughts!”

    Ethan replied 12 months ago 3 Members · 2 Replies
  • 2 Replies
  • John

    Member
    5 June 2023 at 2:00 am

    Sure, I can break it down more casually!

    So, the hiccup seems to be coming from the way you’re trying to count pageviews in the GA4 data on BigQuery. Here’s your code, right?

    select count(distinct case when event_name = 'page_view' then concat (user_pseudo_id, cast(event_timestamp as string)) end) as pageviews

    At a guess, I’d say it’s got its knickers in a knot because you’re asking it to count distinct combination of ‘user_pseudo_id’ and ‘event_timestamp,’ which isn’t always unique. To spin it another way, imagine you’re trying to identify distinct visitors by their hats and coats. The problem pops up if two folks show up wearing the same hat and coat combo – they get counted as one.

    So, to avoid this wardrobe confusion, you could add the ‘session_id’ to the mix, like adding different colored boots to your hat and coat tracking. That’ll help to distinguish each visitor more accurately and should hopefully get your data in line with what GA4’s showing. Give it a try and see how it goes!

  • Ethan

    Member
    5 June 2023 at 7:22 am

    There could be several reasons for the discrepancy you are encountering. One possible explanation might relate to the scope of your query. Make sure that both sets of data are pulling from the same time range and include all necessary event parameters and user properties.

    There could also potentially be issues with bots and spam can cause outliers in web traffic data. You would need to filter out this bot traffic.

    Another possibility could be the differences in the data collection between GA4 and Big Query. GA4 data is usually sampled while Big Query exports are raw. Therefore, the way you’re querying data in Big Query might not match how GA4 processes and presents data on the GA4 interface.

    Lastly, data processing latency can be a factor to consider. BigQuery might have more up-to-date information than what’s shown in your GA4 dashboard, especially if you’re checking it in real time.

    You might want to check if there are any duplicate pageviews occurring, this can be caused by duplicate tracking code implementation, website redirects, or instant reloads.

    If none of the above work, contacting Google support might be your last resort. Although the issue is quite complex, I hope this would guide you closer to the answer.

Log in to reply.