community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

LEARN MORE

Parsing phone numbers and emails from dirty data!

Alteryx Certified Partner

Hi All,

 

So I've had an interesting few days parsing data out from customer spreadsheets and sending it to other formats. Living la vida loca.

 

I thought I would share the product of these days with the community and see if they can help me improve what I currently have/save themselves some time in the future if they have a similar job

 

Email_Parsing

https://gallery.alteryx.com/#!app/email_parse/56b1a4b3a248970f909ad76e

Quite a straight forward macro, basically grab your favourite email regex and stick it in a short workflow. I'm happy with how mine performs currently, I think I stole it from the http://www.regular-expressions.info/ website a while ago and it has been my go to ever since.

 

I currently have the macro outputting all the emails it finds in one row, concatenated by whatever you specify as the deliminator. I have also included options for the original input and residules to help with debugging.

 

Phone Number Parsing

https://gallery.alteryx.com/#!app/phone_number_parse/56b1a5173df7da0740bc597c

You may be able to tell by looking into this macro that it was built to parse phone numbers from very (VERY) dirty data. Unfortunately you cannot just go and grab a regex for this job as people like to format their phone numbers in an insanely varied number of ways(+1.800.1332444 or 1 800 133 2444 or +(1) 800 13 32 44 44 and on and on).

 

My solution to this is to do some cleansing. In my case removing availability that was seen to be written with phone numbers, i.e. 07xxxxxxxxx(24/7) or 07xxxxxxxxx(10:00am-11PM). I also try and parse out any extensions that might have been listed, so if you're contacting a customer though a direct phone you don't obliterate that information. When the phone numbers are reconstructed I do so with a format of +000000000000x 000 where x 000 is the extension if it exists.

 

Your milage may vary with this macro but for the dataset I had it seems to work consistently. I would be happy to hear any improvements though. Particularly if someone has a set of rules for interational numbers that they would be like to share.

 

 

Hope these can be of use to someone!

 

Ben - BIPB

 

 

Sr. Community Content Manager
Sr. Community Content Manager

Awesome macros - and thanks for sharing for everyone to use! Nice companions to the cleanse macro.

Asteroid

Hi Ben,

Thank you for posting this macro- I believe I will find it very useful for a few projects that I have. Thanks again.


@BCarley wrote:

Hi All,

 

So I've had an interesting few days parsing data out from customer spreadsheets and sending it to other formats. Living la vida loca.

 

I thought I would share the product of these days with the community and see if they can help me improve what I currently have/save themselves some time in the future if they have a similar job

 

Email_Parsing

https://gallery.alteryx.com/#!app/email_parse/56b1a4b3a248970f909ad76e

Quite a straight forward macro, basically grab your favourite email regex and stick it in a short workflow. I'm happy with how mine performs currently, I think I stole it from the http://www.regular-expressions.info/ website a while ago and it has been my go to ever since.

 

I currently have the macro outputting all the emails it finds in one row, concatenated by whatever you specify as the deliminator. I have also included options for the original input and residules to help with debugging.

 

Phone Number Parsing

https://gallery.alteryx.com/#!app/phone_number_parse/56b1a5173df7da0740bc597c

You may be able to tell by looking into this macro that it was built to parse phone numbers from very (VERY) dirty data. Unfortunately you cannot just go and grab a regex for this job as people like to format their phone numbers in an insanely varied number of ways(+1.800.1332444 or 1 800 133 2444 or +(1) 800 13 32 44 44 and on and on).

 

My solution to this is to do some cleansing. In my case removing availability that was seen to be written with phone numbers, i.e. 07xxxxxxxxx(24/7) or 07xxxxxxxxx(10:00am-11PM). I also try and parse out any extensions that might have been listed, so if you're contacting a customer though a direct phone you don't obliterate that information. When the phone numbers are reconstructed I do so with a format of +000000000000x 000 where x 000 is the extension if it exists.

 

Your milage may vary with this macro but for the dataset I had it seems to work consistently. I would be happy to hear any improvements though. Particularly if someone has a set of rules for interational numbers that they would be like to share.

 

 

Hope these can be of use to someone!

 

Ben - BIPB

 

 


 

Labels