This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Often as I am scraping web sites, some clever developer has put an invisible character (ASCII or Unicode) in the data which causes terrible trouble.
I've identified 89 instances of zero-width or non-zero-width glyphs that are not visible and/or Alteryx does not classify as whitespace. There are probably more, but Unicode is big y'all.
Unfortunately, the Trim() string function only removes 4 of these characters (Tab, Newline, Carriage Feed, and Space). REGEX_REPLACE with the \s option (which is what the Cleanse macro uses) is a little better but still only removes 20. And it removes all instances, not just leading and trailing.