Alteryx Designer Desktop Discussions

EvansM · ‎03-25-2022

Hello,

I need some help in how to identify character in a number field:

example

I want to identify and remove it

expected result should be like from 063799541A ---------> 063799541

Thanks

grossal · ‎03-25-2022

Hi @EvansM,

I'd try the following Regex:

Before:

After:

Why?

In the [] you can specify a character set and we use everything between a-z (all small letters) as well as A-Z (all capital letters). Combine with the Replace Method and the empty Replacement Text makes up removing all letters from the String.

Does that work in your case or are there edge cases that aren't covered?

Best

Alex

T_Willins · ‎03-25-2022

Absolutely agreeing with @grossal but showing that there are multiple ways to do things in Alteryx, you can also use a Data Cleanse tool:

Qiu · ‎03-25-2022

@EvansM
Another tool we can use is the Regex_Replace 😁

EvansM · ‎03-26-2022

@grossal Thanks a lot. That works. But I still want to know how many records have characters in [REFERENCE_CODE1] column before I removed them.

EvansM · ‎03-26-2022

@T_Willins Thanks a lot. That works. But I still want to know how many records have characters in [REFERENCE_CODE1] column before I removed them.

EvansM · ‎03-26-2022

@Qiu Thanks a lot. I'm total zero when comes to REGEX

grossal · ‎03-26-2022

Hi @EvansM,

I see various ways how to achieve that, reaching from very simple to a little bit more challenging.

Very simple:

Insert a Formula tool to copy your [REFERENCE_CODE1] column into a new one. Than use one of the three shown ways to delete the word characters. Afterwards you can use a simple Formula "[REFERENCE_CODE1] != [Copy]" and set it to Bool. It will show True when there was a character and False if there wasn't.

Regex method:

There is a method called Regex_CountMatches (inside Formula tool) which we can utilizes to count how many word-characters we have.

It would look like this:

REGEX_CountMatches([REFERENCE_CODE1], '[a-zA-Z]')

This will return 0 if no word-character is in the field or the according number in case there are one or many word-characters.

I'd also like to use this chance to explain the difference between the three solutions you have seen in this thread.

My approach:

I used the Regex-Tool because I already saw that you had flagged / tagged this post with Regex and therefore wanted to go that way. However, why didn't I use the way @Qiu showed?

I think the Regex-Tool has one great advantage compared to the formula and that's:

When you start with Regex and it seems like you are doing that, it can be very beneficial to have these listed next to the expression you are writing. I'd also recommend to have a look at Regex-101. They provide a visual editor and will highly what your expression is matching. Here is an example:

We can see that the expression is highlighting the A in the beginning of the first text, as well as the A in end of the second text. We also have nice text explanation on the right.

DataCleansing (Tim's approach)

In 99 out of 100 cases, I probably would have suggested that one too, only your Regex-Tag kept me away from it. The DataCleansing tool is 100% the easiest approach - you don't need to know any Regex to use it. There is only one single downside to it: It is slow. The DataCleansing tool is known for being one of the slowest tools in the Alteryx toolset, but if your dataset is not too big and/or performance doesn't matter much, than I'd definitely go with it. It's also the easiest to read for everyone else and easier to maintain.

Formula Tool (Qui's approach)

When I use the Regex-Delete function, I usually go with the Formula-Tool as well, because you can add multiple of it and that's what I usually have to do. E.g. you need to perform multiple delete action, that would require either 1 Formula tool or multiple Regex-Tools. I simply used the Regex-Tool because of the nicer lookup for you. But when using it on my own, I prefer to go with the Formula as well.

Have a great weekend :-)

Best

Alex

Alteryx Designer Desktop Discussions

How to identify character in a number field?