Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to Fix a Column of Data

ScottC1971
7 - Meteor

Afternoon,

 

I am still fairly new to Alteryx but need some help as I have two issues that are kind of the same - I think.

 

My data set is my company's wireless data which means there is a phone number, device ID, user name, device type, device model, etc.  The usual data points one would expect when working with wireless data.

 

Issue #1

 

My issue is that I need to correct the phone numbers that are showing up as they are showing up as a number such as:

 

Raw DataCorrected Data
2,065,551,212206-555-1212
8,175,551,212817-555-1212
9,405,551,212940-555-1212

 

Issue #2

 

Some device types are encoded differently and I need to normalize the column of data.

 

Raw DataCorrected Data
iPhone xRiPhone XR
iPhone X RiPhone XR

iPhone XR

iPhone XR
iPhone RXiPhone XR

 

Can you please give me some advice on how I might be able to do this?

 

Thanks in advance!

 

Scott

2 REPLIES 2
apathetichell
19 - Altair

For item 1 - assuming your raw data is a string (mine is) - use regex_replace(replace([Raw Data],",",""),"(\d{3})(\d{3})(\d{3,})","$1-$2-$3") - if not sub the replace command input with a tostring([rawdata])

 

I think you'll need to provide more examples for number 2 and probably have to use some fuzzy match strategy - you can hard code a structure for just the XR but it won't work... Ideally you'd use something that extracts the characters following the first space, but how your organize those letters doesn't really follow a pattern I believe in phone types.

 

 

jrgo
14 - Magnetar

Hi @ScottC1971 

 

I can help on the first item. The formula expression below does correct the raw data you listed in your sample

REGEX_REPLACE(
  REGEX_REPLACE([Raw Data], '\D', '')
  , '^(\d{3})(\d{3})(\d{4})$'
  , '$1-$2-$3'
)

The inner REGEX removes all characters that aren't numeric and the outer REGEX looks for and create 3 groups which are then included in the replacement value adding the dashes between each group.

jrgo_0-1628719075317.png

This will NOT work if the raw data contains more or less than 10 numbers.

 

For the second issue is not as easy of a solution since I assume there's likely a wide range of strings that may have slight variations. You can create an initial correction table with common errors and then use a Find/Replace tool to update those. Find/Replace can be set to be case insensitive and/or be set to match all or part of the field.

 

For the remaining values that weren't corrected, Fuzzy match is probably going to be your best bet.

 

Hope this helps!

Jimmy

Labels