community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Extracting numbers from a string

Asteroid

Hello Alteryx Experts!

I know you will be able to help me find a solution to my issue. 

I have very dirty data, very dirty data! But there is hope at the end of the tunnel.

I have a text field where I need to extract 3 numbers which occur together. Th first number will always be preceded with an "S", the next number by "O" and the last by "R".

The typical different formats will be:

There is text before the needed fields S:2 O:2 O:3 and there is text after the needed fields

There is text before the needed fields S: 2 O: 2 O: 3 and there is text after the needed fields

There is text before the needed fields S:2, O:2, O:3 and there is text after the needed fields

There is text before the needed fields S: 2, O: 2, O: 3, and there is text after the needed fields

There is text before the needed fields (S:2 O:2 O:3) and there is text after the needed fields

There is text before the needed fields (S: 2 O: 2 O: 3) and there is text after the needed fields

 

But there will always be a letter than a number. 

 

Somebody please help me.

 

Thanks in advance :)

 

Alteryx Certified Partner

I suggest using the power of RegEx in the Formula tool to eliminate everything that isn't a numeric character. Try this formula:

 

REGEX_Replace([Field],"[^\d]","")

 

This expression will replace everything that isn't a numeric [0-9] character with nothing ("")

Alteryx Partner

Hello @ZoeM ,

 

If this are the only numbers in the filed you can use the regex tool with this expression:

 

(.*)(\d)(.*)(\d)(.*)(\d)(.*)

 

and for the replace the following

 

$2$4$6

 

This will give you the 3 numbers toghether in a field

 

Cheers

Alteryx Certified Partner
Alteryx Certified Partner

Hi @ZoeM 

@CharlieS  nailed it. This works like a charm.

This would also work: REGEX_Replace([Field],"[^0-9]","")

 

Cheers,

Asteroid

Excellent recommendation.

That was my first stab and my formula was:

 

REGEX_Replace([Summary], "[\D]", "")

 

The problem is that there are also instances like:

 

There is text 123 before the necessary fields S: 2 S:2 O:3 and the rest 345 is also there...

 

So there would be numbers too in the text but only want the ones focused on S, O, and R

 

Alteryx Partner

Maybe this is a little too much, but it works on the regex tool:

 

Reg expression:

(.*\:)(\s)?(\d)(.*\:)(\s)?(\d)(.*\:)(\s)?(\d)(.*)

Replacement text

$3$6$9

 

PD:

If there are more letters with the same parameters you can also use:

(.*S\:)(\s)?(\d)(.*O\:)(\s)?(\d)(.*R\:)(\s)?(\d)(.*)

 

Cheers

Asteroid

Tried that in the RegEx Tool and dint get the desired result.

Is there a way to specify the below:

 

Search for "S:" and get subsequent numerical value, "O:" and numerical value and "R:" and subsequent numerical value?

Nebula

Hi @ZoeM 

 

I think this gives what you're looking for

 

WF.png

 

The regex is set to parse with this expression

 

.*(S(:|: )(\d+)).*(O(:|: )(\d+)).*(R(:|: )(\d+)).*

 

Giving each result in it's own field

 

Results.png

 

Dan

Alteryx Certified Partner

I added one step to @danilang's solution: replacing any occurrence of "S:", "O:", or "R:" with a common "#:" so the sequence of "S","O", or "R" doesn't matter. This appears to work in all scenarios.

 

20190517-RegExParse.png

Highlighted
Asteroid

You must be from Louisiana because this is straight up magic!!!!!

i so so appreciate all of everyone's input!

 

The community succeeds again...

Labels