Learn business growth with Google Analytics 4 Forums Google Analytics 4 Enhancing URL Recognition for Foreign Languages in Google Analytics 4

  • Enhancing URL Recognition for Foreign Languages in Google Analytics 4

    Posted by Henry on 6 August 2022 at 1:35 pm

    Hey there! I’m working with a client whose website is in Bulgarian, and GA4 seems to be having a tricky time recognising the URLs in that language. Got any tips for how to make these URLs more tractable?

    Let me show you what I mean with this screenshot. You’ll see Google’s translating the URLs in a strange way. Meanwhile, the browser shows the page names correctly in Bulgarian.

    Here’s the kicker, I need to analyze these URLs and feed the data to Looker Studio, but those unusual symbols in the URLs are messing up my filtering process. Can you think of a way to rewrite these URLs in the same style as they’re displayed in the browser? I appreciate any ideas you might have!

    Ashton replied 1 year ago 3 Members · 2 Replies
  • 2 Replies
  • Oscar

    Member
    20 June 2023 at 1:11 am

    Alright mate, here’s a quick method to get GA4 to play nice with the Bulgarian URLs.

    Use the back-end to read your Google Analytics Data. You can use the URLDecode class in Java to decode the URL. Replace “encodedString” with the value of the dimension that contains the URL.

    Something like this…
    `java
    import java.net.URLDecoder;
    import java.io.UnsupportedEncodingException;

    public class Main {
    public static void main(String[] args) {
    try {
    String encodedString = “https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FURL%23Syntax”;
    String decodedString = URLDecoder.decode(encodedString, “UTF-8”);
    System.out.println(decodedString);
    } catch (UnsupportedEncodingException e) {
    System.err.println(e);
    }
    }

    }
    `
    This should give you properly decoded URLs for your analysis. Give it a shot!

  • Ashton

    Member
    30 June 2023 at 8:22 pm

    From what you’ve described, it seems Google Analytics 4 (GA4) is trying to encode your URLs that have non-latin characters, which is causing problems for your data analysis. To address this, it might be helpful to implement URL rewriting rules on your website’s server. There’s a way in web servers to automatically rewrite URLs before they are sent to the user’s browser. This is done with the help of a module called mod_rewrite in Apache servers, or using Rewrite module in NGINX servers.

    You’ll likely need to work with your webmaster or web hosting provider to create URL rewriting rules to ensure that URLs are displayed in a more “friendly” format, ideally the same as displayed in the browser, therefore making them more recognizable and efficient for your use in GA4 and Looker Studio. This process can also go under the name “URL normalization,” where you map multiple accesses to a resource to a single canonical URL.

    Please remember that making these changes could potentially impact your SEO, so make sure to proceed carefully, likely incorporating 301 redirects to maintain SEO rankings.

    Alternatively, if rewriting URLs at source is not feasible, you can create code within GA4 to decipher the URLs during or after data ingestion. This will require code modifications which should be done by an experienced developer.

    As a final thought, for optimal compatibility with diverse systems, having URLs primarily in English (where feasible) is a generally recommended web development practice.

Log in to reply.