Be sure to review our Idea Submission Guidelines for more information!
Submission GuidelinesOften as I am scraping web sites, some clever developer has put an invisible character (ASCII or Unicode) in the data which causes terrible trouble.
I've identified 89 instances of zero-width or non-zero-width glyphs that are not visible and/or Alteryx does not classify as whitespace. There are probably more, but Unicode is big y'all.
Unfortunately, the Trim() string function only removes 4 of these characters (Tab, Newline, Carriage Feed, and Space).
REGEX_REPLACE with the \s option (which is what the Cleanse macro uses) is a little better but still only removes 20. And it removes all instances, not just leading and trailing.
I've attached a workflow which proves this issue.
@apolly: this is what I mentioned at GKO.
And I did see this post (https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Elegantly-remove-all-ASCII-characters-...), but it's too brute force. Especially as Alteryx is localized and more users need those Unicode characters.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.