Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Pulling numeric information from data with multiple formats into new column

Highlighted
5 - Atom

Hi experts,

New to Alteryx and have been tasked with cleaning up some data that is more complicated than I expected.

We have Type recorded in a string of user details and I need to pull out the type value (4 or 5 digits) into a new column. The problem is that Type has been entered with absolutely no consistency in the field.

 

data.jpg

 

As you can see, sometimes it is Type with no space, Type:, Type# or even my favorite, Type #= .

I've been trying to go about it with RegEx but I'm not getting anywhere :)

 

thanks in advance!

Highlighted
14 - Magnetar
14 - Magnetar

Hey @CraigF 

 

Use a normal Formula tool and create a new Int field with this expression:

TONUMBER(iif(regex_match([Field1], ".*?TYPE.*(\d\d\d\d\d).*"),REGEX_Replace([Field1], ".*?TYPE.*?(\d\d\d\d\d).*", "$1"),REGEX_Replace([Field1], ".*?TYPE.*?(\d\d\d\d).*", "$1")))

 

Basically, it is saying if there are 5 digits in a row after the word TYPE, use those 5 digits, otherwise use the 4 digits after the word TYPE.

 

Hope this helps! 

Highlighted
17 - Castor
17 - Castor

hi @CraigF 

 

Great solution from @Kenda.

 

Mine's a little different in that it looks for "Type" followed non-digits followed by digits

(Type[^\d]+[\d]+)

I then use another Regex to pull the digits from this.  I know that there is a way to search within the previous results, and combine the 2 regex tools into one, but my regex-FU is not strong enough.  Maybe @Thableaus can help here.

 

WF.png

 

Result.png

 

Dan

Highlighted
7 - Meteor

Hi,

 

This solution may help you fetch the type from the unstructured text you posted.

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

Hi @danilang - my help here:

 

I think REGEX tool with Parse Method would work with this single expression:

 

.*?(?:\btype\W*)(\d+).*

 

This would bring any sequence of digits after the word type (uppercased or not since we are considering case insensitivity here) and a non-word character 0 or more times (that would be all of the symbols or spaces that the author of the topic mentioned).

 

Cheers,

Highlighted
5 - Atom

quick and dirty, I like it, thank you).

Going to play with everyone elses solution later when I have more time as well to continue my learning!

 

Labels