community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

LEARN MORE
SOLVED

Data Cleansing - Identify numbers, date, street address from the data field.

Meteoroid

I have a mix of relevant and irrelevant values in my field. I want to identify numbers, street addresses, date and store names and add a corresponding value in the new field "Clean store".

 

My actual data:

Raw Store
   283 FALLS CHURCH V.A 22041
   4820 DUBLIN BLVD, DUBLIN, CA 94568
   ABT GLENVIEW IL 60025
   AMAZN
   AMERICAN APPLIANCE
   BED BAT AND BEYOND
   EST BUY
7/2/219
6/7/298
45645456365
90908

 

Expected output:

Raw StoreClean store
   283 FALLS CHURCH V.A 22041UNKNOWN
   4820 DUBLIN BLVD, DUBLIN, CA 94568UNKNOWN
   ABT GLENVIEW IL 60025UNKNOWN
   AMAZN   AMAZON
   AMERICAN APPLIANCE   AMERICAN APPLIANCE
   BED BAT AND BEYOND   BED BATH & BEYOND
   EST BUY   BEST BUY
7/2/219UNKNOWN
6/7/298UNKNOWN
45645456365UNKNOWN
90908UNKNOWN

 

How can I identify the irrelevant store values? Once that is done, I am pretty sure about how I can add a new field and mark them as "unknown".

I assume that you have a separate process for helping determine what the true store name should be... but RegEx is going to be your friend, I think, for helping identify which values you don't want. Try the following expressions to determine if the value matches (could use a formula that checks for each of these conditions and assigns a status of "UNKNOWN" if the expression = -1):

 

To find records that appear to be addresses:

REGEX_Match([Raw Store],".*\s\d{5}$") <-- This looks for fields that end with a space followed by 5 digits... you'd want to modify this if you were going to potentially have addresses that had 9-digit zip codes or zip codes that contain letters such as those in Canada & abroad.

 

To find records that appear to be dates:

REGEX_Match([Raw Store],"\d+\/\d+\/\d+") <-- Looks for a pattern of numbers separated by / to determine if it's a date.

 

To find records that appear to be numbers:

REGEX_Match([Raw Store],"^\d+$") <-- Looks for fields that only contain numbers.

 

RegEx can be really powerful for finding patterns like these in your data - check out Regex101.com to learn more and play around with different expressions. Hope this helps!

 

Cheers,

NJ

Meteoroid

Thanks. This works.

Labels