Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Elegantly remove all ASCII characters outside the range 32 -126

jhollingsworth
7 - Meteor
Hello Artisans,

I'm working on a module that includes a geocoder tool. I'm running into an issue in which the data the users are submitting to the module are copied from a web page and sometimes include weird characters (e.g. ASCII 160). This causes the geocoder problems. And by problems, I mean that the geocoder can only find the zip code when without the odd characters, it can find the address.

What I'm looking for is the most elegant way to remove any characters from a text value that fall outside of the ASCII range of 32 -126. Any characters inside that range *shouldn't* cause the geocoder problems.

Extra cool points to anyone from Alteryx that will fix incorporate this fix into the Address tool. :-)

Cheers,

John Hollingsworth
14 REPLIES 14
ToddW
Alteryx Alumni (Retired)

I am definately in need of some extra cool points, so here is my most elegant way to remove ASCII characters outside of the range of 32-126.  I was able to remove the undesired characters using RegEx (you can use the RegEx tool or the Formula tool for this) using the following expression:


[^x20-x7e]+


Everything within the brackets represents a specific set of characters to match.  The x switch matches an ASCII charater using hexidecimal representation.  x20-x7e translates to ASCII characters 32-126.  The ^ preceeding the range indicates that we want to match anything NOT in the specified range.  The + outside the brackets indicates that we want to search for this pattern in our string 1 or more times.

Using the RegEx tool:

rtaImage.jpg

Using the Formula tool:

rtaImage (1).jpg

Let me know if you have any questions.

Todd Williamson
Client Services Rep
Cool Points: 7

jhollingsworth
7 - Meteor
Outstanding! That was exactly what I needed.

Whenever someone uses RegEx I think of this.

Thanks Todd!
ToddW
Alteryx Alumni (Retired)
Glad it worked out!  Since posting my last response, a couple of my colleagues shared a simpler expression that accomplishes the same task:

[^ -~]


Apparently there is no need to use hexidecimal values for the ASCII charaters - you can literally type them in (please note there is a space between the carat^ and the dash-.

Todd Williamson
Client Services Rep
"Everybody stand back!  I know RegEx!"
sd
7 - Meteor

Todd,

 

I'm a little late with this... Smiley Happy

 

I've been searching for an  answer and your solution worked great!  Thank you for the detailed post and for following up with a more elegant solution to your initial response.

 

sd

Inactive User
Not applicable

Don't forget the backslashes in the expression, which are shown in the graphic of the RegEx box but not in the accompanying text. [^\x20-\x7e]+

NotQuiteClueless
7 - Meteor

You Sir, are a star.

MarqueeCrew
20 - Arcturus
20 - Arcturus

@ToddW,

 

I like your expression (having used it myself), but wanted to describe the request in English and to offer @jhollingsworth an alternative solution.

 

  • Remove any characters from the incoming field that are outside of the range of ascii values from "space" to "~".

This expression can be written in a form that more closely matches these requirements as:

REGEX_Replace([Field1], "[^ -~]", '$1')

[...] is the set of values

[^..] is then, the values not in the set

[^ -~] uses (space) followed by a dash followed by a tilde is then, the values not in the range of (space) to ~ (tilde)

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
MarqueeCrew
20 - Arcturus
20 - Arcturus

@ToddW,

 

.... just read a latter post showing the same.

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Jordken
7 - Meteor

This worked great for the most part: [^ -~]

 

However, I think I want to keep the carriage returns (13) and new line  (10) characters as well. I'm still trying to figure this regex out, and I'm not getting it. Any help?

Labels