Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extracting numbers from a string

ZoeM
8 - Asteroid

Hello Alteryx Experts!

I know you will be able to help me find a solution to my issue. 

I have very dirty data, very dirty data! But there is hope at the end of the tunnel.

I have a text field where I need to extract 3 numbers which occur together. Th first number will always be preceded with an "S", the next number by "O" and the last by "R".

The typical different formats will be:

There is text before the needed fields S:2 O:2 O:3 and there is text after the needed fields

There is text before the needed fields S: 2 O: 2 O: 3 and there is text after the needed fields

There is text before the needed fields S:2, O:2, O:3 and there is text after the needed fields

There is text before the needed fields S: 2, O: 2, O: 3, and there is text after the needed fields

There is text before the needed fields (S:2 O:2 O:3) and there is text after the needed fields

There is text before the needed fields (S: 2 O: 2 O: 3) and there is text after the needed fields

 

But there will always be a letter than a number. 

 

Somebody please help me.

 

Thanks in advance :)

 

11 REPLIES 11
CharlieS
17 - Castor
17 - Castor

I suggest using the power of RegEx in the Formula tool to eliminate everything that isn't a numeric character. Try this formula:

 

REGEX_Replace([Field],"[^\d]","")

 

This expression will replace everything that isn't a numeric [0-9] character with nothing ("")

afv2688
16 - Nebula
16 - Nebula

Hello @ZoeM ,

 

If this are the only numbers in the filed you can use the regex tool with this expression:

 

(.*)(\d)(.*)(\d)(.*)(\d)(.*)

 

and for the replace the following

 

$2$4$6

 

This will give you the 3 numbers toghether in a field

 

Cheers

Thableaus
17 - Castor
17 - Castor

Hi @ZoeM 

@CharlieS  nailed it. This works like a charm.

This would also work: REGEX_Replace([Field],"[^0-9]","")

 

Cheers,

ZoeM
8 - Asteroid

Excellent recommendation.

That was my first stab and my formula was:

 

REGEX_Replace([Summary], "[\D]", "")

 

The problem is that there are also instances like:

 

There is text 123 before the necessary fields S: 2 S:2 O:3 and the rest 345 is also there...

 

So there would be numbers too in the text but only want the ones focused on S, O, and R

 

afv2688
16 - Nebula
16 - Nebula

Maybe this is a little too much, but it works on the regex tool:

 

Reg expression:

(.*\:)(\s)?(\d)(.*\:)(\s)?(\d)(.*\:)(\s)?(\d)(.*)

Replacement text

$3$6$9

 

PD:

If there are more letters with the same parameters you can also use:

(.*S\:)(\s)?(\d)(.*O\:)(\s)?(\d)(.*R\:)(\s)?(\d)(.*)

 

Cheers

ZoeM
8 - Asteroid

Tried that in the RegEx Tool and dint get the desired result.

Is there a way to specify the below:

 

Search for "S:" and get subsequent numerical value, "O:" and numerical value and "R:" and subsequent numerical value?

danilang
19 - Altair
19 - Altair

Hi @ZoeM 

 

I think this gives what you're looking for

 

WF.png

 

The regex is set to parse with this expression

 

.*(S(:|: )(\d+)).*(O(:|: )(\d+)).*(R(:|: )(\d+)).*

 

Giving each result in it's own field

 

Results.png

 

Dan

CharlieS
17 - Castor
17 - Castor

I added one step to @danilang's solution: replacing any occurrence of "S:", "O:", or "R:" with a common "#:" so the sequence of "S","O", or "R" doesn't matter. This appears to work in all scenarios.

 

20190517-RegExParse.png

ZoeM
8 - Asteroid

You must be from Louisiana because this is straight up magic!!!!!

i so so appreciate all of everyone's input!

 

The community succeeds again...

Labels