community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Elegantly remove all ASCII characters outside the range 32 -126

Hello Artisans,

I'm working on a module that includes a geocoder tool. I'm running into an issue in which the data the users are submitting to the module are copied from a web page and sometimes include weird characters (e.g. ASCII 160). This causes the geocoder problems. And by problems, I mean that the geocoder can only find the zip code when without the odd characters, it can find the address.

What I'm looking for is the most elegant way to remove any characters from a text value that fall outside of the ASCII range of 32 -126. Any characters inside that range *shouldn't* cause the geocoder problems.

Extra cool points to anyone from Alteryx that will fix incorporate this fix into the Address tool. :-)

Cheers,

John Hollingsworth
Moderator
Moderator

I am definately in need of some extra cool points, so here is my most elegant way to remove ASCII characters outside of the range of 32-126.  I was able to remove the undesired characters using RegEx (you can use the RegEx tool or the Formula tool for this) using the following expression:


[^x20-x7e]+


Everything within the brackets represents a specific set of characters to match.  The x switch matches an ASCII charater using hexidecimal representation.  x20-x7e translates to ASCII characters 32-126.  The ^ preceeding the range indicates that we want to match anything NOT in the specified range.  The + outside the brackets indicates that we want to search for this pattern in our string 1 or more times.

Using the RegEx tool:

rtaImage.jpg

Using the Formula tool:

rtaImage (1).jpg

Let me know if you have any questions.

Todd Williamson
Client Services Rep
Cool Points: 7

Outstanding! That was exactly what I needed.

Whenever someone uses RegEx I think of this.

Thanks Todd!
Moderator
Moderator
Glad it worked out!  Since posting my last response, a couple of my colleagues shared a simpler expression that accomplishes the same task:

[^ -~]


Apparently there is no need to use hexidecimal values for the ASCII charaters - you can literally type them in (please note there is a space between the carat^ and the dash-.

Todd Williamson
Client Services Rep
"Everybody stand back!  I know RegEx!"
Alteryx Partner

Todd,

 

I'm a little late with this... :smileyhappy:

 

I've been searching for an  answer and your solution worked great!  Thank you for the detailed post and for following up with a more elegant solution to your initial response.

 

sd

Don't forget the backslashes in the expression, which are shown in the graphic of the RegEx box but not in the accompanying text. [^\x20-\x7e]+

You Sir, are a star.

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

@ToddW,

 

I like your expression (having used it myself), but wanted to describe the request in English and to offer @jhollingsworth an alternative solution.

 

  • Remove any characters from the incoming field that are outside of the range of ascii values from "space" to "~".

This expression can be written in a form that more closely matches these requirements as:

REGEX_Replace([Field1], "[^ -~]", '$1')

[...] is the set of values

[^..] is then, the values not in the set

[^ -~] uses (space) followed by a dash followed by a tilde is then, the values not in the range of (space) to ~ (tilde)

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Alteryx Certified Partner
Alteryx Certified Partner

@ToddW,

 

.... just read a latter post showing the same.

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Atom

This worked great for the most part: [^ -~]

 

However, I think I want to keep the carriage returns (13) and new line  (10) characters as well. I'm still trying to figure this regex out, and I'm not getting it. Any help?

Labels