community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Ideas

Share your Designer product ideas - we're listening!
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

LEARN MORE

Robustly remove invisible characters and other whitespace

Often as I am scraping web sites, some clever developer has put an invisible character (ASCII or Unicode) in the data which causes terrible trouble.

I've identified 89 instances of zero-width or non-zero-width glyphs that are not visible and/or Alteryx does not classify as whitespace. There are probably more, but Unicode is big y'all.

Unfortunately, the Trim() string function only removes 4 of these characters (Tab, Newline, Carriage Feed, and Space).
REGEX_REPLACE with the \s option (which is what the Cleanse macro uses) is a little better but still only removes 20. And it removes all instances, not just leading and trailing.

I've attached a workflow which proves this issue.


@APolly: this is what I mentioned at GKO.

And I did see this post (https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Elegantly-remove-all-ASCII-characters-...), but it's too brute force. Especially as Alteryx is localized and more users need those Unicode characters.

4 Comments
Alteryx Certified Partner
Fireball
Fireball

Thanks for the link, @Ruud 

Alteryx Certified Partner

I've given you a star and linked back as this is a broader solution that'd solve my idea here Smiley Happy 

https://community.alteryx.com/t5/Alteryx-Designer-Ideas/Remove-Zero-Width-Spaces-with-the-Data-Clean...

Alteryx
Alteryx
Status changed to: Under Review

Thanks for submitting this idea. I'll investigate!