Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Clean up the phone number

msantoso
8 - Asteroid

hi everyone, 

 

I need to clean up a phone number column which contains phone numbers but not only! :)

 

for the following cases, I need some help

 

- the name of the contact is preceeding the phone number. The length of the name is always different. the only similar case is that the phone number is preceeded with a blank space. 

          ex: Ms Marple 0176/123456789

 

- two phone numbers in the same cell separated with a dash or a single letter 

 

thank you very much for your help

Myriam

 

11 REPLIES 11
DavidP
17 - Castor
17 - Castor

Hi Myriam,

 

A regular expression (Regex), like this should do the trick

 

phone number parse.png

DavidP
17 - Castor
17 - Castor

Sorry, typo in the regex. It should look like this: 

 

(.+\s)(\d+)[/\s](\d+)

DavidP
17 - Castor
17 - Castor

One more correction - I misread the original post, if the numbers are separated by a '/' or a letter (not a space), the regex should look like this:

 

(.+\s)(\d+)[/[:alpha:]](\d+)

msantoso
8 - Asteroid

Hi David, 

 

 

there is a little mistake in your regex: a ( or a ) is missing but I cannot see where :(

 

I wished I could write regex!!

If you have any tutorial to advise, please do so! :)

 

thank you

Myriam

DavidP
17 - Castor
17 - Castor

Hi Myriam,

 

Attached is the Alteryx workflow. With the Regex parse tool, using the parse Output method I created 3 columns (each set of brackets represent a column). The logic works like this:

 

(.+\s)  .+ means one or more of any character and \s is a space, so if you put them together it's a string of characters that ends in a space.

 

(\d+) means a sequence of one or more numbers

 

From your description, we want to exclude the separator character between the 2 numbers which can be a / or a letter, so you write that like this:

 

[/[:alpha:]]  - the outer [] represents a set which is either a / or a letter which in this case is represented by [:alpha:]. Since there are no (), we're telling the parse tool to find this but not to parse it to a column.

 

Finally, we want the 2nd number parsed, which is done by (\d+)

 

There are some great tutorials and training material on the Alteryx community regarding regex (just type in regular expressions in the community search bar), and the down arrow on the right hand side of the regular expression input window expands to a list of regex commands.

 

The best thing to do is to play around with this yourself and you'll figure it out in no time.

msantoso
8 - Asteroid

Hi David,

 

thanks a lot for this regex. 

 

This regex is splitting the filed into 3 parts 

(i) it splits the name of the contact

(ii) then it splits the digits before the /

(iii) and copies the ending digits into the last field. 

 

So that is fine when the content is "Ms Marple 0176/123456789"

but when the cell contains 2 phone numbers separated with a letter, then the regex is putting the first phone number + the letter into RegexOut1.

 

How can I manage to have the regex working only when the cell contains a name? 

thanks a lot

myriam

 

DavidP
17 - Castor
17 - Castor

Hi Myriam,

 

I don't get the same problem where the letter is part of RegExOut1 - please see below. Could your data contain something else in between the 2 numbers? In the formula tool you can use Regex_Match to test if the string starts with a name and based on the result, go one of 2 ways.

 

Please find workflow attached.

 

phone number parse.png

msantoso
8 - Asteroid

Hi David,

 

thank you so much for this recipe. It works for almost all the use cases I have in my file. 

 

the strings that are still annoying me are:

 

0162/2870461 o. 0173/6456482

09071-54111 oder 0172/2470400

0151/12139881 - 0170/2257314

 

in these 3 cases I only retreive the second phone number, since the first one is considered as a name. 

thank you 

myriam

DavidP
17 - Castor
17 - Castor

For those 3 strings I would use a regex like this:  (\d+)[/-](\d+)\s.+\s(\d+)[/[:alpha:]](\d+)

 

The new workflow then looks like this:

 

phone number parse.png

Labels
Top Solution Authors