Good morning folks,
I'm using address data in a project and I need to extract only the numbers from the given address. In other software packages I was able to use the INCLUDE or EXCLUDE function to allow me to arrive at a field with 'just numbers' or 'just letters'. I cannot seem to replicate this action in Alteryx and I have very little experience using REGEX. Any help is appreciated. The table below shows my address field and Address Key is my desired result.
Thanks,
George
Address Field | AddressKey |
830 New Jersey Avenue, Brooklyn NY 11207 | 83011207 |
2344 Lorillard Place, Apt 3, Bronx NY 10458 | 2344310458 |
3333 101st Street, Corona NY 11368 | 333310111368 |
1415 Ave O, Brooklyn NY 11230 | 141511230 |
Solved! Go to Solution.
@gfisch13 Glad you're staying engaged with the community. It is a great place!
One thing I wondered about the Data Cleanse tool... I know your output example turned "3333 101st Street, Corona NY 11368" into "333310111368" but would the spirit of what you're after also want to exclude the street name regardless if it's a number-name or not? In this case "101st". By removing just characters and leaving numbers, it includes a portion of the street name (the 101). Feels inconsistent to have that included? Not being critical at all of @Mathias_Nielsen as I do love that option; but I continue to wonder if that's consistent with what your ultimate goal is...
Anyway, just some additional musings. -Jay
Good morning Jay, I certainly appreciate the conversation and I'm always looking for ways to improve my workflows. I've been using this technique for many years. Matching addresses is tough due to all the variations of words, street names, etc. I was at a fraud conference and did a breakout session with ACL and they taught this technique that leaves just the numbers to create a key. The biggest issue I've faced over the years is the variations in address formats. This method has allowed me to get past that while improving my match rate. It's probably not 100% accurate but I know it will work for my data as everything is in the same format coming out of our source systems. I wanted my company to buy the address normalization tool set but they wouldn't buy, so here I am. Happy to listen if you have additional ideas. Have a great day!!
Hey! I’d use the REGEX tool for this. Something like REGEX_Replace([Address], "[^\d]", "") will strip out everything that’s not a number and leave you with just the digits. Super handy for generating an Address Key.
User | Count |
---|---|
106 | |
85 | |
76 | |
54 | |
40 |