Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!
The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Robustly remove invisible characters and other whitespace

Often as I am scraping web sites, some clever developer has put an invisible character (ASCII or Unicode) in the data which causes terrible trouble.

I've identified 89 instances of zero-width or non-zero-width glyphs that are not visible and/or Alteryx does not classify as whitespace. There are probably more, but Unicode is big y'all.

Unfortunately, the Trim() string function only removes 4 of these characters (Tab, Newline, Carriage Feed, and Space).
REGEX_REPLACE with the \s option (which is what the Cleanse macro uses) is a little better but still only removes 20. And it removes all instances, not just leading and trailing.

I've attached a workflow which proves this issue.


@apolly: this is what I mentioned at GKO.

And I did see this post (https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Elegantly-remove-all-ASCII-characters-...), but it's too brute force. Especially as Alteryx is localized and more users need those Unicode characters.

9 Comments
Ruud
10 - Fireball
Hollingsworth
12 - Quasar
12 - Quasar

Thanks for the link, @Ruud 

Balders
11 - Bolide

I've given you a star and linked back as this is a broader solution that'd solve my idea here :) 

https://community.alteryx.com/t5/Alteryx-Designer-Ideas/Remove-Zero-Width-Spaces-with-the-Data-Clean...

RachelW
Alteryx Alumni (Retired)
Status changed to: Under Review

Thanks for submitting this idea. I'll investigate! 

wale_ilori
9 - Comet

Heartily agree with this. I had a dataset that I had to bring into Excel and compare using the EXACT formula to see something was wrong which non of the trimming tools in Alteryx would correct, causing misleading values when categorizing data with the offending defect.

SeanAdams
17 - Castor
17 - Castor

This one is super-important - thank you for raising @Hollingsworth  - we cannot limit to ASCII only because we have an international user base; and at the same time we do need to deal with non-visible characters more robustly because we constantly work with platforms like Mainframe or various flavours of Unix/Linux that use different non-visible characters.

 

Super important update to a tool like the Data Cleanse tool and to the Trim formula!

KylieF
Alteryx Community Team
Alteryx Community Team
Status changed to: Not Planned

Thank you for your post! This idea is interesting to us, however we've determined that we're unable to include this idea on our road map for the product at this point due to several outstanding issues and factors. However, should we be able to return to your idea in the future we will update the status back to Under Review.

SeanAdams
17 - Castor
17 - Castor

@Hollingsworth - we may have to do a shared project instead to build this out in the Python SDK since it's not planned for the product.

SideOfRanch
8 - Asteroid

did this project every go anywhere. The two macros in the YXMD don't load for me anymore? Seems like version compatibility issue?