Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Regex or extraction?

JDong
8 - Asteroid

Hi Team,

 

I want to extract City information from below data. The data is really not consistent but below are some samples.

 

Street from 13, 64546 City1
New 25673 City2 Windowstreet Main 6
Nelson-Marker-Street 45/Market 4 67, 2123 City3
Pm 78426 City4 Sweep/ Hang Torre
WiegelSty. 3 42329 City5
Doorstreet 471-6, 71923 City6
Childrenstreet.78-5, 72810 City7
Williamstreet 89, 19220 City8
Emeli-Mary-Street o., 71920 City9
43921 City10, NTP Gallery Allee 7
West-Red-Block 12-424; 27292 City11 City11

 

There may be more use cases but these seems the most likely ones. I want to extract them into a new column. After this I want to do a fuzzy logic or something on those lines to rename the cities as the cities names are also not consistent for example

 

Dallas

D_aLL_a_s

dalla

allas

 

Something on these lines.

 

Any help on this is highly appreciated! Thanks

1 REPLY 1
FinnCharlton
13 - Pulsar

Hi @JDong , this is quite a hard problem, both for Alteryx and for other data processing methods. Computers just aren't that great with words. To use RegEx, you'll need to isolate a pattern, or a couple of possible patterns, that define where the city name comes in the string. For example, in your examples the city name is always after a 5 digit postcode, so you could use that pattern to isolate your city name. This can run into problems though, for example with 2 word cities, 5 digit street addresses, exceptions to this pattern etc.

 

Your best bet may be to use a dictionary of city names, which you should be able to find online. You can split each address into individual words and join them to the dictionary, which should extract the city from the address. The misspelled city names will make this tricky though, you can try to use the Fuzzy Match tool in Merge mode: give this thread a read for more advice:

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Can-I-make-a-fuzzy-join-on-two-data-so...

 

Good luck!

Labels