Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Pulling numeric information from data with multiple formats into new column

Atom

Hi experts,

New to Alteryx and have been tasked with cleaning up some data that is more complicated than I expected.

We have Type recorded in a string of user details and I need to pull out the type value (4 or 5 digits) into a new column. The problem is that Type has been entered with absolutely no consistency in the field.

 

data.jpg

 

As you can see, sometimes it is Type with no space, Type:, Type# or even my favorite, Type #= .

I've been trying to go about it with RegEx but I'm not getting anywhere :)

 

thanks in advance!

Highlighted
Magnetar
Magnetar

Hey @CraigF 

 

Use a normal Formula tool and create a new Int field with this expression:

TONUMBER(iif(regex_match([Field1], ".*?TYPE.*(\d\d\d\d\d).*"),REGEX_Replace([Field1], ".*?TYPE.*?(\d\d\d\d\d).*", "$1"),REGEX_Replace([Field1], ".*?TYPE.*?(\d\d\d\d).*", "$1")))

 

Basically, it is saying if there are 5 digits in a row after the word TYPE, use those 5 digits, otherwise use the 4 digits after the word TYPE.

 

Hope this helps! 

Highlighted
Castor
Castor

hi @CraigF 

 

Great solution from @Kenda.

 

Mine's a little different in that it looks for "Type" followed non-digits followed by digits

(Type[^\d]+[\d]+)

I then use another Regex to pull the digits from this.  I know that there is a way to search within the previous results, and combine the 2 regex tools into one, but my regex-FU is not strong enough.  Maybe @Thableaus can help here.

 

WF.png

 

Result.png

 

Dan

Highlighted
Meteor

Hi,

 

This solution may help you fetch the type from the unstructured text you posted.

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

Hi @danilang - my help here:

 

I think REGEX tool with Parse Method would work with this single expression:

 

.*?(?:\btype\W*)(\d+).*

 

This would bring any sequence of digits after the word type (uppercased or not since we are considering case insensitivity here) and a non-word character 0 or more times (that would be all of the symbols or spaces that the author of the topic mentioned).

 

Cheers,

Highlighted
Atom

quick and dirty, I like it, thank you).

Going to play with everyone elses solution later when I have more time as well to continue my learning!

 

Labels