Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Pulling numeric information from data with multiple formats into new column

CraigF
5 - Atom

Hi experts,

New to Alteryx and have been tasked with cleaning up some data that is more complicated than I expected.

We have Type recorded in a string of user details and I need to pull out the type value (4 or 5 digits) into a new column. The problem is that Type has been entered with absolutely no consistency in the field.

 

data.jpg

 

As you can see, sometimes it is Type with no space, Type:, Type# or even my favorite, Type #= .

I've been trying to go about it with RegEx but I'm not getting anywhere :)

 

thanks in advance!

5 REPLIES 5
Kenda
16 - Nebula
16 - Nebula

Hey @CraigF 

 

Use a normal Formula tool and create a new Int field with this expression:

TONUMBER(iif(regex_match([Field1], ".*?TYPE.*(\d\d\d\d\d).*"),REGEX_Replace([Field1], ".*?TYPE.*?(\d\d\d\d\d).*", "$1"),REGEX_Replace([Field1], ".*?TYPE.*?(\d\d\d\d).*", "$1")))

 

Basically, it is saying if there are 5 digits in a row after the word TYPE, use those 5 digits, otherwise use the 4 digits after the word TYPE.

 

Hope this helps! 

danilang
19 - Altair
19 - Altair

hi @CraigF 

 

Great solution from @Kenda.

 

Mine's a little different in that it looks for "Type" followed non-digits followed by digits

(Type[^\d]+[\d]+)

I then use another Regex to pull the digits from this.  I know that there is a way to search within the previous results, and combine the 2 regex tools into one, but my regex-FU is not strong enough.  Maybe @Thableaus can help here.

 

WF.png

 

Result.png

 

Dan

neeraj_778
7 - Meteor

Hi,

 

This solution may help you fetch the type from the unstructured text you posted.

Thableaus
17 - Castor
17 - Castor

Hi @danilang - my help here:

 

I think REGEX tool with Parse Method would work with this single expression:

 

.*?(?:\btype\W*)(\d+).*

 

This would bring any sequence of digits after the word type (uppercased or not since we are considering case insensitivity here) and a non-word character 0 or more times (that would be all of the symbols or spaces that the author of the topic mentioned).

 

Cheers,

CraigF
5 - Atom

quick and dirty, I like it, thank you).

Going to play with everyone elses solution later when I have more time as well to continue my learning!

 

Labels
Top Solution Authors