Afternoon,
I am still fairly new to Alteryx but need some help as I have two issues that are kind of the same - I think.
My data set is my company's wireless data which means there is a phone number, device ID, user name, device type, device model, etc. The usual data points one would expect when working with wireless data.
Issue #1
My issue is that I need to correct the phone numbers that are showing up as they are showing up as a number such as:
Raw Data | Corrected Data |
2,065,551,212 | 206-555-1212 |
8,175,551,212 | 817-555-1212 |
9,405,551,212 | 940-555-1212 |
Issue #2
Some device types are encoded differently and I need to normalize the column of data.
Raw Data | Corrected Data |
iPhone xR | iPhone XR |
iPhone X R | iPhone XR |
iPhone XR | iPhone XR |
iPhone RX | iPhone XR |
Can you please give me some advice on how I might be able to do this?
Thanks in advance!
Scott
Solved! Go to Solution.
For item 1 - assuming your raw data is a string (mine is) - use regex_replace(replace([Raw Data],",",""),"(\d{3})(\d{3})(\d{3,})","$1-$2-$3") - if not sub the replace command input with a tostring([rawdata])
I think you'll need to provide more examples for number 2 and probably have to use some fuzzy match strategy - you can hard code a structure for just the XR but it won't work... Ideally you'd use something that extracts the characters following the first space, but how your organize those letters doesn't really follow a pattern I believe in phone types.
Hi @ScottC1971
I can help on the first item. The formula expression below does correct the raw data you listed in your sample
REGEX_REPLACE(
REGEX_REPLACE([Raw Data], '\D', '')
, '^(\d{3})(\d{3})(\d{4})$'
, '$1-$2-$3'
)
The inner REGEX removes all characters that aren't numeric and the outer REGEX looks for and create 3 groups which are then included in the replacement value adding the dashes between each group.
This will NOT work if the raw data contains more or less than 10 numbers.
For the second issue is not as easy of a solution since I assume there's likely a wide range of strings that may have slight variations. You can create an initial correction table with common errors and then use a Find/Replace tool to update those. Find/Replace can be set to be case insensitive and/or be set to match all or part of the field.
For the remaining values that weren't corrected, Fuzzy match is probably going to be your best bet.
Hope this helps!
Jimmy