Learn business growth with Google Analytics 4 Forums Google Analytics 4 Strategies for importing and extracting GA4 event data from BigQuery into SQL Server

  • Strategies for importing and extracting GA4 event data from BigQuery into SQL Server

    Posted by Oliver on 16 September 2022 at 9:44 pm

    Let’s break it down. We’ve been pulling daily event CSV files from BigQuery using Python. These files have a bunch of data columns, some of which hold .json data that might even have more .json data embedded in there – it’s like a data Matryoshka doll situation. Each of these daily CSVs hold around 1 million records.

    Using the Google GA4 Events Schema, we’ve started setting up corresponding tables in SQL server to keep everything organised. The tables we’re working on include: ‘event’, ‘user’, ‘device’, ‘geo’, ‘app_info’, ‘traffic_source’, ‘stream and platform’, ‘ecommerce’ and ‘items’.

    Now, here’s where we’d love your advice – what’s the best approach to parse these CSVs, with their potential layers of .json, into their respective SQL server tables? We had a crack at creating an SSIS package that would import the CSV into a staging table, and a script that would parse the rows. But we quickly realised that we’d end up with a super complex query filled with cross apply statements, which got us worried about performance issues. Any ideas?

    Jeffrey replied 3 weeks ago 4 Members · 3 Replies
  • 3 Replies
  • Taylor

    Member
    13 October 2022 at 12:23 pm

    Hey there! It sounds like you have quite the Matryoshka doll of data on your hands! Just a heads up, the best practice for this kind of situation is to flatten your BigQuery GA4 export before you even export it. Basically, you want to simplify that json data in your event property/custom dimension column so you don’t have to worry about it later. A common approach is to create a unique column for each event property or custom dimension.

    Then, once everything’s all organized and simplified like that, you can go ahead and shift your data into the SQL server tables. It’s a relatively straightforward move at that point. If you decide to transform the data any further, it’ll mostly depend on your specific requirements. Just keep in mind that ETL logic can get a bit hefty as you make modifications over time. But no worries, it’s all part of refining and improving your data, right?

    I hope that helps! Feel free to reach out if you have any other data dilemmas you need a hand with.

  • Abdul

    Member
    15 June 2023 at 8:15 pm

    One solution could be to use Python, which excels in handling and parsing JSON data, to handle the complex parsing process. Firstly, you can write a script that reads the CSV file and the potentially nested JSON objects in each row. Python’s JSON module (json.loads) can convert JSON data into Python dictionaries, which is a format you can work with more easily. You can then transform the data and segregate it as needed, matching the target structure of your SQL Server tables. Lastly, use a Python SQL toolkit like SQLAlchemy to insert the data into your SQL server tables. You can also handle any issues with data conversion, formatting, and error checking before loading the data into SQL Server tables, making this approach more robust. Of course, remember to test this process on a smaller scale before running it on your full 1 million records data set.

  • Jeffrey

    Member
    29 October 2024 at 10:06 am

    Hello @,

    I think you can find easier ways for your task, for example, using some third-party tools, like Skyvia or Elastic. It can extract data from BigQuery, handle nested JSON parsing, and load it directly into SQL Server without requiring custom Python scripts or manual ETL processes, so this approach is much simpler and automated.

    Hope it will be useful

Log in to reply.