Elegantly remove all ASCII characters outside the range 32 -126
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I'm working on a module that includes a geocoder tool. I'm running into an issue in which the data the users are submitting to the module are copied from a web page and sometimes include weird characters (e.g. ASCII 160). This causes the geocoder problems. And by problems, I mean that the geocoder can only find the zip code when without the odd characters, it can find the address.
What I'm looking for is the most elegant way to remove any characters from a text value that fall outside of the ASCII range of 32 -126. Any characters inside that range *shouldn't* cause the geocoder problems.
Extra cool points to anyone from Alteryx that will fix incorporate this fix into the Address tool. :-)
Cheers,
John Hollingsworth
Solved! Go to Solution.
- Labels:
- Interface Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am definately in need of some extra cool points, so here is my most elegant way to remove ASCII characters outside of the range of 32-126. I was able to remove the undesired characters using RegEx (you can use the RegEx tool or the Formula tool for this) using the following expression:
[^x20-x7e]+
Everything within the brackets represents a specific set of characters to match. The x switch matches an ASCII charater using hexidecimal representation. x20-x7e translates to ASCII characters 32-126. The ^ preceeding the range indicates that we want to match anything NOT in the specified range. The + outside the brackets indicates that we want to search for this pattern in our string 1 or more times.
Using the RegEx tool:
Using the Formula tool:
Let me know if you have any questions.
Todd Williamson
Client Services Rep
Cool Points: 7
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Whenever someone uses RegEx I think of this.
Thanks Todd!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
[^ -~]
Apparently there is no need to use hexidecimal values for the ASCII charaters - you can literally type them in (please note there is a space between the carat^ and the dash-.
Todd Williamson
Client Services Rep
"Everybody stand back! I know RegEx!"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Todd,
I'm a little late with this...
I've been searching for an answer and your solution worked great! Thank you for the detailed post and for following up with a more elegant solution to your initial response.
sd

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Don't forget the backslashes in the expression, which are shown in the graphic of the RegEx box but not in the accompanying text. [^\x20-\x7e]+
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
You Sir, are a star.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I like your expression (having used it myself), but wanted to describe the request in English and to offer @jhollingsworth an alternative solution.
- Remove any characters from the incoming field that are outside of the range of ascii values from "space" to "~".
This expression can be written in a form that more closely matches these requirements as:
REGEX_Replace([Field1], "[^ -~]", '$1')
[...] is the set of values
[^..] is then, the values not in the set
[^ -~] uses (space) followed by a dash followed by a tilde is then, the values not in the range of (space) to ~ (tilde)
Cheers,
Mark
Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This worked great for the most part: [^ -~]
However, I think I want to keep the carriage returns (13) and new line (10) characters as well. I'm still trying to figure this regex out, and I'm not getting it. Any help?
