I have a mix of relevant and irrelevant values in my field. I want to identify numbers, street addresses, date and store names and add a corresponding value in the new field "Clean store".
My actual data:
Raw Store |
283 FALLS CHURCH V.A 22041 |
4820 DUBLIN BLVD, DUBLIN, CA 94568 |
ABT GLENVIEW IL 60025 |
AMAZN |
AMERICAN APPLIANCE |
BED BAT AND BEYOND |
EST BUY |
7/2/219 |
6/7/298 |
45645456365 |
90908 |
Expected output:
Raw Store | Clean store |
283 FALLS CHURCH V.A 22041 | UNKNOWN |
4820 DUBLIN BLVD, DUBLIN, CA 94568 | UNKNOWN |
ABT GLENVIEW IL 60025 | UNKNOWN |
AMAZN | AMAZON |
AMERICAN APPLIANCE | AMERICAN APPLIANCE |
BED BAT AND BEYOND | BED BATH & BEYOND |
EST BUY | BEST BUY |
7/2/219 | UNKNOWN |
6/7/298 | UNKNOWN |
45645456365 | UNKNOWN |
90908 | UNKNOWN |
How can I identify the irrelevant store values? Once that is done, I am pretty sure about how I can add a new field and mark them as "unknown".
Solved! Go to Solution.
I assume that you have a separate process for helping determine what the true store name should be... but RegEx is going to be your friend, I think, for helping identify which values you don't want. Try the following expressions to determine if the value matches (could use a formula that checks for each of these conditions and assigns a status of "UNKNOWN" if the expression = -1):
To find records that appear to be addresses:
REGEX_Match([Raw Store],".*\s\d{5}$") <-- This looks for fields that end with a space followed by 5 digits... you'd want to modify this if you were going to potentially have addresses that had 9-digit zip codes or zip codes that contain letters such as those in Canada & abroad.
To find records that appear to be dates:
REGEX_Match([Raw Store],"\d+\/\d+\/\d+") <-- Looks for a pattern of numbers separated by / to determine if it's a date.
To find records that appear to be numbers:
REGEX_Match([Raw Store],"^\d+$") <-- Looks for fields that only contain numbers.
RegEx can be really powerful for finding patterns like these in your data - check out Regex101.com to learn more and play around with different expressions. Hope this helps!
Cheers,
NJ
Thanks. This works.