Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extracting numbers from a string

ZoeM
8 - Asteroid

Hello Alteryx Experts!

I know you will be able to help me find a solution to my issue. 

I have very dirty data, very dirty data! But there is hope at the end of the tunnel.

I have a text field where I need to extract 3 numbers which occur together. Th first number will always be preceded with an "S", the next number by "O" and the last by "R".

The typical different formats will be:

There is text before the needed fields S:2 O:2 O:3 and there is text after the needed fields

There is text before the needed fields S: 2 O: 2 O: 3 and there is text after the needed fields

There is text before the needed fields S:2, O:2, O:3 and there is text after the needed fields

There is text before the needed fields S: 2, O: 2, O: 3, and there is text after the needed fields

There is text before the needed fields (S:2 O:2 O:3) and there is text after the needed fields

There is text before the needed fields (S: 2 O: 2 O: 3) and there is text after the needed fields

 

But there will always be a letter than a number. 

 

Somebody please help me.

 

Thanks in advance :)

 

11 REPLIES 11
CharlieS
17 - Castor
17 - Castor

I suggest using the power of RegEx in the Formula tool to eliminate everything that isn't a numeric character. Try this formula:

 

REGEX_Replace([Field],"[^\d]","")

 

This expression will replace everything that isn't a numeric [0-9] character with nothing ("")

afv2688
16 - Nebula
16 - Nebula

Hello @ZoeM ,

 

If this are the only numbers in the filed you can use the regex tool with this expression:

 

(.*)(\d)(.*)(\d)(.*)(\d)(.*)

 

and for the replace the following

 

$2$4$6

 

This will give you the 3 numbers toghether in a field

 

Cheers

Thableaus
17 - Castor
17 - Castor

Hi @ZoeM 

@CharlieS  nailed it. This works like a charm.

This would also work: REGEX_Replace([Field],"[^0-9]","")

 

Cheers,

ZoeM
8 - Asteroid

Excellent recommendation.

That was my first stab and my formula was:

 

REGEX_Replace([Summary], "[\D]", "")

 

The problem is that there are also instances like:

 

There is text 123 before the necessary fields S: 2 S:2 O:3 and the rest 345 is also there...

 

So there would be numbers too in the text but only want the ones focused on S, O, and R

 

afv2688
16 - Nebula
16 - Nebula

Maybe this is a little too much, but it works on the regex tool:

 

Reg expression:

(.*\:)(\s)?(\d)(.*\:)(\s)?(\d)(.*\:)(\s)?(\d)(.*)

Replacement text

$3$6$9

 

PD:

If there are more letters with the same parameters you can also use:

(.*S\:)(\s)?(\d)(.*O\:)(\s)?(\d)(.*R\:)(\s)?(\d)(.*)

 

Cheers

ZoeM
8 - Asteroid

Tried that in the RegEx Tool and dint get the desired result.

Is there a way to specify the below:

 

Search for "S:" and get subsequent numerical value, "O:" and numerical value and "R:" and subsequent numerical value?

danilang
19 - Altair
19 - Altair

Hi @ZoeM 

 

I think this gives what you're looking for

 

WF.png

 

The regex is set to parse with this expression

 

.*(S(:|: )(\d+)).*(O(:|: )(\d+)).*(R(:|: )(\d+)).*

 

Giving each result in it's own field

 

Results.png

 

Dan

CharlieS
17 - Castor
17 - Castor

I added one step to @danilang's solution: replacing any occurrence of "S:", "O:", or "R:" with a common "#:" so the sequence of "S","O", or "R" doesn't matter. This appears to work in all scenarios.

 

20190517-RegExParse.png

ZoeM
8 - Asteroid

You must be from Louisiana because this is straight up magic!!!!!

i so so appreciate all of everyone's input!

 

The community succeeds again...

Labels