-
Strategies for importing and extracting GA4 event data from BigQuery into SQL Server
Let’s break it down. We’ve been pulling daily event CSV files from BigQuery using Python. These files have a bunch of data columns, some of which hold .json data that might even have more .json data embedded in there – it’s like a data Matryoshka doll situation. Each of these daily CSVs hold around 1 million records.
Using the Google GA4 Events Schema, we’ve started setting up corresponding tables in SQL server to keep everything organised. The tables we’re working on include: ‘event’, ‘user’, ‘device’, ‘geo’, ‘app_info’, ‘traffic_source’, ‘stream and platform’, ‘ecommerce’ and ‘items’.
Now, here’s where we’d love your advice – what’s the best approach to parse these CSVs, with their potential layers of .json, into their respective SQL server tables? We had a crack at creating an SSIS package that would import the CSV into a staging table, and a script that would parse the rows. But we quickly realised that we’d end up with a super complex query filled with cross apply statements, which got us worried about performance issues. Any ideas?
Log in to reply.