-
What could be causing abnormal retention rates in bigquery compared to GA4?
Hey fellow data wranglers, I’m in a pickle and could really use your help!
Here’s the deal: I’ve been feeding Google Analytics 4 (GA4) data into BigQuery. Things are going decently, but I’ve hit a snag with the SQL query I crafted for the retention table. On Day 0, the number of users appears as you’d expect. It’s when we look at days after that, things get wonky – the numbers are ridiculously high!
The weird thing is that I know this doesn’t jive because when I compare it to the retention table GA4 itself kicks out or the one in AppMetrica, my numbers are off. Plus, I’ve used this very same SQL query for a couple of other apps and it worked like a charm, lining up neatly with what GA4 and AppMetrica showed. So, I don’t think this query is the issue.
What’s really bugging me is that Day 0 is alright. So I suspect something fishy is going on when we crunch the numbers with varying dates.
Here’s my SQL query, I wonder if any of you ninja coders see something I am missing or could suggest some tweaks?
WITH table1 AS ( SELECT user_pseudo_id AS user_pseudo_id, MIN(event_date) AS birthday, FROM table WHERE event_name = 'first_open' GROUP BY user_pseudo_id ) , table2 AS ( SELECT r.birthday AS birthday, l.event_date AS event_date, DATE_DIFF(l.event_date, r.birthday, DAY) AS diff, l.user_pseudo_id AS user_pseudo_id FROM table AS l JOIN table1 AS r ON l.user_pseudo_id = r.user_pseudo_id ) , table3 AS ( SELECT birthday, diff, COUNT(DISTINCT user_pseudo_id) as count_users, FIRST_VALUE(CASE WHEN diff = 0 THEN COUNT(DISTINCT user_pseudo_id) END) OVER (PARTITION BY birthday ORDER BY diff ASC) AS users_day0 FROM table2 GROUP BY birthday, diff ) SELECT birthday, diff, count_users, users_day0, count_users/users_day0 AS retention FROM table3
If any of you have faced something similar or have ideas on what could be causing this, I could really use your input! I’m all ears.
Log in to reply.