The data contains a field for address, but the complete address has been placed into 1 column. I want to simply parse the address so that we can assign individual columns for: street address, city, and state.
I've been able to parse out the State by adding in a comma, but cannot find a way to parse the city as cities can be 2+ words (ex. BATON ROGUE). Ultimately, I need insights into how to best parse a city from the complete address to separate columns in a way to handle all potential cities.
Start:
As you can see above, if you attempted to parse the address on a space, then record 3 and 5 would not work since the city would be separated.
Desired Output:
STREET ADDRESS | CITY |
4364 ULYSSES DR | SACRAMENTO |
5031 HARTCELL CT | ST CLOUD |
without having a better delimiter already in the value, like where the comma should normally go, the only ways i can think to solve this would be to have some kind of reference list of cities in the USA, which would not be fun. or the other way would be to make a list of all the values a street might be called, such as STREET, ST, CT, RD, DR. you'll have to add yet another check for when none of those is present (like your 46 ROLLINS address).
that's not going to be fun either, but probably easier than the first method. then you could find the position of that string up until that ST, DR, RD etc and substring everything after it, leaving only city.
@jackf
This is a tricky one, I believe.
And so far, the best solution we can come up is the using the City list and do a matching.
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Regex-to-Parse-City-State/td-p/670515