community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Data Cleansing - Identify numbers, date, street address from the data field.

Meteoroid

I have a mix of relevant and irrelevant values in my field. I want to identify numbers, street addresses, date and store names and add a corresponding value in the new field "Clean store".

 

My actual data:

Raw Store
   283 FALLS CHURCH V.A 22041
   4820 DUBLIN BLVD, DUBLIN, CA 94568
   ABT GLENVIEW IL 60025
   AMAZN
   AMERICAN APPLIANCE
   BED BAT AND BEYOND
   EST BUY
7/2/219
6/7/298
45645456365
90908

 

Expected output:

Raw StoreClean store
   283 FALLS CHURCH V.A 22041UNKNOWN
   4820 DUBLIN BLVD, DUBLIN, CA 94568UNKNOWN
   ABT GLENVIEW IL 60025UNKNOWN
   AMAZN   AMAZON
   AMERICAN APPLIANCE   AMERICAN APPLIANCE
   BED BAT AND BEYOND   BED BATH & BEYOND
   EST BUY   BEST BUY
7/2/219UNKNOWN
6/7/298UNKNOWN
45645456365UNKNOWN
90908UNKNOWN

 

How can I identify the irrelevant store values? Once that is done, I am pretty sure about how I can add a new field and mark them as "unknown".

Magnetar
Magnetar

I assume that you have a separate process for helping determine what the true store name should be... but RegEx is going to be your friend, I think, for helping identify which values you don't want. Try the following expressions to determine if the value matches (could use a formula that checks for each of these conditions and assigns a status of "UNKNOWN" if the expression = -1):

 

To find records that appear to be addresses:

REGEX_Match([Raw Store],".*\s\d{5}$") <-- This looks for fields that end with a space followed by 5 digits... you'd want to modify this if you were going to potentially have addresses that had 9-digit zip codes or zip codes that contain letters such as those in Canada & abroad.

 

To find records that appear to be dates:

REGEX_Match([Raw Store],"\d+\/\d+\/\d+") <-- Looks for a pattern of numbers separated by / to determine if it's a date.

 

To find records that appear to be numbers:

REGEX_Match([Raw Store],"^\d+$") <-- Looks for fields that only contain numbers.

 

RegEx can be really powerful for finding patterns like these in your data - check out Regex101.com to learn more and play around with different expressions. Hope this helps!

 

Cheers,

NJ

Highlighted
Meteoroid

Thanks. This works.

Labels