Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to remove generic phone numbers

Xiahyl1
6 - Meteoroid

Hi everyone,

 

I need to clean up a phone number column, below is an example of the data. I need to remove spaces, dashes, leading and trailing spaces, letters, etc, which I did with the data cleansing tool just fine, but I also need to remove generic phone numbers such as 666666666 or 000000000, but not sure how to do that I tried to use regex but I am a newbie and could not figure out how to do it. Any Idea? 

Thanks.

 

Data example:

 

Client IdMobile Phone
1646-285-6758
2666666666
3508-787-1000
4212-207-2175
59-011-972-545-
6914-844-6711, 
7505- 692- 9967
8000000000
9917000000
10860  248  1457
9 REPLIES 9
AndrewS
11 - Bolide

hi @Xiahyl1 

 

I've attached a workflow which should do what you are looking for. I googled the regex and modified it.

 

 

Xiahyl1
6 - Meteoroid

Thank you AndrewS, that helped me, it removed the generic phone number formatting, but it doesn't remove for example 947000000 which is not a valid phone number, it removed the zeros but not the 947. It might sound a dumb question, but I am new to using regex and cannot figure it out.

 

Thanks again for your help.

neilgallen
12 - Quasar

This boils down to how to determine what a 'valid' phone number is.

 

In the situation above, you could parse out the area code from the remainder of the phone number and put the remaining 7 digit number through the same check as before, looking for a single digit repeatedly.

 

Keep in mind that this solution would only remove numbers that are "11111" "22222222" etc. Were someone to give you a pattern such as "123456789" it would pass as valid since the numbers don't repeat. You'd have to create a new scenario for that one.

AndrewS
11 - Bolide

Hi @Xiahyl1 , I agree with @neilgallen , you need to define a rule around what is considered a valid number.

 

I've amended the workflow with two additional options.

 

The first will tokenize the data and then provide a max count on the number of consecutive characters. You can then set your tolerance with a filter.

 

The second will strip the first characters out as mentioned by @neilgallen and then run the same regex against the substring.

 

Hope this helps.

Xiahyl1
6 - Meteoroid

Thanks very much for your ideas; it helped me figure out what I needed.

Using the example below, I was able to create 3 regexes that do what I need.

I would like to know if there's a way to combine the 3 to have just one; I tried but it does not work.

 

1st regex: \d{10,} (remove numbers with more than 9 digits)

2nd regex: ^(\d){0,8}$ (remove numbers with less than 9 digits)

3rd regex: (\d)\1{8,} (remove generic numbers like 666666666 for example)

 

Data example:

Client IdMobile Phone
1646-285-6758
2666666666
3508-787-1000
4212-207-2175
59-011-972-545-
6914-844-6711, 
7505- 692- 9967
8000000000
9917000000
10860  248  1457
Jonathan-Sherman
15 - Aurora
15 - Aurora

Hi @Xiahyl1,

 

You can put you're regex expressions inside a Regex_Match function in a filter tool to remove rows that meet each expression. All expressions could be pushed into one filter tool, however i would separate them out to try and reduce the load. If your first filter removes 100 rows this means your next filter has less work to do and so on. Therefore is more efficient in the long run, especially as string testing / manipulation can be very memory intensive. The most effective (removing the most records) should be the first filter, leaving others less and less to do so you may want to change the order of the filter tools.

 

image.png

 

If this solves your issue please mark the answer as correct, if not let me know! I've attached my workflow for you to download if needed.

 

Regards,

Jonathan

Xiahyl1
6 - Meteoroid

Hi Jonathan,

If I understand correctly, your workflow will remove the records, but I don't need to remove the record, I need to keep the record with the phone number space blanc.

For example, if I have: 

Client IdMobile Phone  Name
1646-285-758XYZ
2666666666ABC

 

It should return:

Client IdMobile Phone Name
1646-285-758XYZ
2 ABC

 

I need to keep the record.

 

Thanks a lot 

Jonathan-Sherman
15 - Aurora
15 - Aurora

Hi @Xiahyl1,

 

Sorry i misunderstood, i've attached an amended workflow to use the REGEX_Replace function. Which replaces any matched functions with an empty cell.

 

image.png

If this solves your issue please mark the answer as correct, if not let me know! I've attached my workflow for you to download if needed.

 

Regards,

Jonathan

Xiahyl1
6 - Meteoroid

Thank you very much Jonathan, it worked perfectly. 

Labels