community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Ideas

Share your Designer product ideas - we're listening!

Robustly remove invisible characters and other whitespace

Often as I am scraping web sites, some clever developer has put an invisible character (ASCII or Unicode) in the data which causes terrible trouble.

I've identified 89 instances of zero-width or non-zero-width glyphs that are not visible and/or Alteryx does not classify as whitespace. There are probably more, but Unicode is big y'all.

Unfortunately, the Trim() string function only removes 4 of these characters (Tab, Newline, Carriage Feed, and Space).
REGEX_REPLACE with the \s option (which is what the Cleanse macro uses) is a little better but still only removes 20. And it removes all instances, not just leading and trailing.

I've attached a workflow which proves this issue.


@APolly: this is what I mentioned at GKO.

And I did see this post (https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Elegantly-remove-all-ASCII-characters-...), but it's too brute force. Especially as Alteryx is localized and more users need those Unicode characters.

5 Comments
Alteryx Certified Partner
Fireball
Fireball

Thanks for the link, @Ruud 

Alteryx Certified Partner

I've given you a star and linked back as this is a broader solution that'd solve my idea here 🙂 

https://community.alteryx.com/t5/Alteryx-Designer-Ideas/Remove-Zero-Width-Spaces-with-the-Data-Clean...

Alteryx Alumni (Retired)
Status changed to: Under Review

Thanks for submitting this idea. I'll investigate! 

Alteryx Partner

Heartily agree with this. I had a dataset that I had to bring into Excel and compare using the EXACT formula to see something was wrong which non of the trimming tools in Alteryx would correct, causing misleading values when categorizing data with the offending defect.