Learn business growth with Google Analytics 4 Forums Google Analytics 4 Identifying Unique Records in BigQuery GA4 Events Data

  • Identifying Unique Records in BigQuery GA4 Events Data

    Posted by Liam on 8 February 2023 at 5:26 pm

    Hey there!

    I was just working on the BigQuery GA4 events data and ran into a bit of a puzzle. Maybe you can help?

    Here’s the deal: I’m pulling out data from event_params, user_properties, and items fields. The fields have nested json and I want to make sure all related records stick together.

    Can I take my hat off to you, Minhaz? Thanks for your input earlier.

    I’ve got a picture here that shows the results when I checked for multiple rows using the four columns you suggested – event_name, event_timestamp, user_pseudo_id, event_bundle_sequence_id.

    Can you spot the twist? There are several rows with the same column values, but totally different event_parms values. Is this a case of mixing event_parms data from multiple rows into one?

    Maybe I was assuming too much but I thought all related event_parms values should show up within one row with the same vent_name, event_timestamp, user_pseudo_id, and event_bundle_sequence_id record. Does that make sense or am I off track?

    Here’s the SQL query I ran and the results I got. Does anything jump out at you?

    [SQL Query and Results Images]

    Looking forward to your take on this. Let’s solve this puzzle together!

    Chill and cheers!

    Oliver replied 11 months ago 3 Members · 2 Replies
  • 2 Replies
  • James

    14 March 2023 at 6:41 pm

    Sure, let’s take a look at these fields together. As stated in the Google Analytics Documentation:

    For the user fields, these provide unique information about the user who is associated with the event. For example, we have ‘user_id’ which is the user ID set via the setUserId API and ‘user_pseudo_id’, which is the pseudonymous ID (like a specific app instance ID) referencing the user. Another interesting one is the ‘user_first_touch_timestamp’, which tells you the exact time (down to microseconds) when the user first opened the app or visited the site.

    Now, let’s dive right into those event fields. These provide unique information for each specific event. ‘event_date’ marks the exact date (in YYYYMMDD format) when the event was logged in the registered timezone of your app, while ‘event_timestamp’ indicates the precise time (in microseconds, UTC) when the event was logged on the client. The ‘event_name’ – as you probably guessed – just indicates the name of the event.

    A noteworthy field is ‘event_value_in_usd’ where you can find out the event’s “value” parameter, converted into USD. And then, there’s ‘event_bundle_sequence_id’ – this is the sequential ID of the bundle these events were part of when they were uploaded. Finally, ‘event_server_timestamp_offset’ marks the timestamp offset between the actual collection time and the upload time in microseconds.

    I hope this clarifies a lot about the different fields!

  • Oliver

    1 May 2023 at 7:14 am

    The combination of the fields event_name, event_timestamp, user_pseudo_id, and event_bundle_sequence_id are used to uniquely identify each record in the GA4 BigQuery event export table. However, it seems from your results that there are instances of multiple rows having the same values in these columns but different event parameters. What could be happening is that the event_params field, which is nested and JSON-formatted, may contain varying information across multiple events even if the identifying fields are the same. The assumption that related event_params values would always appear in one row with the same identifying record could be incorrect. There might be a mix-up of event_params data from different rows into one. A potential workaround for this could be to add a hash or fingerprint of the entire event_params object to the four mentioned fields, although there currently seems to be no surefire way of identifying unique records as of April 2023.

Log in to reply.