Learn business growth with Google Analytics 4 Forums Google Analytics 4 Optimizing Data Extraction from Google Analytics 4 API

  • Optimizing Data Extraction from Google Analytics 4 API

    Posted by Isabella on 15 January 2023 at 11:32 pm

    Hey there, so here’s what’s happening. I’ve been pulling data from Google Analytics 4 API https://developers.google.com/analytics/devguides/reporting/data/v1/api-schema?hl=en. Don’t know if you’ve noticed this as well but the data I get from GA4 and UA seems to be having a wild party ’cause they’re not matching, quite different in fact. I’m starting to wonder if this “sampling” thing is the culprit and I’m interested in knowing how to scoop up data without sampling. Like taking the ice-cream, and skipping the sprinkles, you know?

    Here’s what I’ve been working with so far. I trail off after pulling 100,000 rows, let’s call ’em, “data nuggets” until the whole act flames out.

    while True:
      print("offset: " + str(offset))
    
      request = {
        "requests": [
          {
            "dateRanges": [
              {
                "startDate": "180daysAgo",
                "endDate": "today"
              }
            ],
            "dimensions": [{'name': name} for name in dimensions],
            "metrics": [{'name': name} for name in metrics],
            "offset": offset,
            "limit": 100000
          }
        ]
      }
    
      # Make Request
      response = analytics_GA4.properties().batchRunReports(property=property_id, body=request).execute()
    
      #Stop Loop 
      if response.get("reports")[0].get("rows") == :
        break;
      else:
        offset = offset + 100000
    
    Amelia replied 1 year ago 3 Members · 2 Replies
  • 2 Replies
  • Brett

    Member
    1 March 2023 at 3:36 pm

    The differences in data you’re noticing between Google Analytics 4 (GA4) and Universal Analytics (UA) could indeed be due to data sampling. Sampling is a method Google uses to generate quicker reports. For large data sets, it only processes a portion of the data (sample) instead of the entire data set. One way to minimize the effect of sampling is to reduce the size of the data you are requesting in each API call. In your code, you are currently requesting to pull 100,000 rows per call. This could potentially cause sampling, as Google may decide to sample reports with large amounts of data to reduce processing time. Try reducing the limit value in your request to a lower number and see if that reduces the discrepancy between GA4 and UA data. Remember to adjust your offset accordingly, to ensure you are collecting the entire data set across multiple API calls.

  • Amelia

    Member
    22 March 2023 at 10:40 am

    As to your data discrepancy between Google Analytics 4 (GA4) and Universal Analytics (UA), it’s important to note that GA4 and UA have fundamental differences in their data models due to which you might see different results. This difference can also affect how the data is sampled and presented.

    As for avoiding data sampling, generally, data sampling happens on larger data sets to provide faster query responses. What you can do to work around this in GA4 API is you can break your request into smaller chunks – for example, requesting data day by day. By doing so, you are less likely to get your results through GA’s data sampling component and more likely to get unsampled data.

    In terms of the code, the code seems to be correct for fetching data using Google Analytics API. It fetches the data in chunks of 100,000 as you have set a limit of 100,000 rows per API call and increments offsets per loop to fetch next chunk of data. However, the API may have a limit on the number of rows returned in a single request and hence you might have to adjust your limit and offset accordingly.

Log in to reply.