Alteryx Designer Desktop Discussions

ScottC1971 · ‎08-11-2021

Afternoon,

I am still fairly new to Alteryx but need some help as I have two issues that are kind of the same - I think.

My data set is my company's wireless data which means there is a phone number, device ID, user name, device type, device model, etc. The usual data points one would expect when working with wireless data.

Issue #1

My issue is that I need to correct the phone numbers that are showing up as they are showing up as a number such as:

Raw Data	Corrected Data
2,065,551,212	206-555-1212
8,175,551,212	817-555-1212
9,405,551,212	940-555-1212

Issue #2

Some device types are encoded differently and I need to normalize the column of data.

Raw Data	Corrected Data
iPhone xR	iPhone XR
iPhone X R	iPhone XR
iPhone XR	iPhone XR
iPhone RX	iPhone XR

Can you please give me some advice on how I might be able to do this?

Thanks in advance!

Scott

apathetichell · ‎08-11-2021

For item 1 - assuming your raw data is a string (mine is) - use regex_replace(replace([Raw Data],",",""),"(\d{3})(\d{3})(\d{3,})","$1-$2-$3") - if not sub the replace command input with a tostring([rawdata])

I think you'll need to provide more examples for number 2 and probably have to use some fuzzy match strategy - you can hard code a structure for just the XR but it won't work... Ideally you'd use something that extracts the characters following the first space, but how your organize those letters doesn't really follow a pattern I believe in phone types.

jrgo · ‎08-11-2021

Hi @ScottC1971

I can help on the first item. The formula expression below does correct the raw data you listed in your sample

REGEX_REPLACE(
  REGEX_REPLACE([Raw Data], '\D', '')
  , '^(\d{3})(\d{3})(\d{4})$'
  , '$1-$2-$3'
)

The inner REGEX removes all characters that aren't numeric and the outer REGEX looks for and create 3 groups which are then included in the replacement value adding the dashes between each group.

This will NOT work if the raw data contains more or less than 10 numbers.

For the second issue is not as easy of a solution since I assume there's likely a wide range of strings that may have slight variations. You can create an initial correction table with common errors and then use a Find/Replace tool to update those. Find/Replace can be set to be case insensitive and/or be set to match all or part of the field.

For the remaining values that weren't corrected, Fuzzy match is probably going to be your best bet.

Hope this helps!

Jimmy

Alteryx Designer Desktop Discussions

How to Fix a Column of Data

Re: How to select columns dynamically using number...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...

Re: Firm names parse

Re: Help with Multi-Row formula