Hi,
I have data which has US addresses but there is no order in the dataset. As in there is no clear distinction as to which is city, which is street name because of no delimiters present in the address.
The State is at the end denoted by just two letters like TX for texas. The city is before that but again it is not just one word, it can be more than one word.
How can i extract city name from this?
Example:- 1234 XXXXX El Paso TX
11 XXX XXXX Garland TX
Solved! Go to Solution.
Hi @RAJ_12
Without any exact pattern its hard to find the city. A thing what you can do is have a look up table with cities and do a find on address.
Hope this helps 🙂
I just split text to columns to start on a result...but maybe something can be done using Regex ?
Again to identify cities with more than one word...happy to see more responses.
Sometimes, Brutal force is beauty.
I managed to download a list of world cities then solved the puzzle.
Awesome work on finding the cities dataset @Qiu 😎👍
If possible can you share us the link where you got the dataset.
@atcodedog05
Actually I took from a previous reply of mine.
maybe here
https://github.com/datasets/world-cities/blob/master/data/world-cities.csv
The main thing is that there has to be a delimiter before the city. You should use regex and keep adding to the OR statement I started for you. It is the part with the pipes:
(?<houseNumber>\d+) (?<streetName>.* (?:Rd|Blvd|Ave|St|Street|Avenue|Place|Pl|Parkway|Pkwy)) (?<city>[\w ]*) (?<state>[A-Z]{2})
I name my capture ranges so it helps breakup the expression for legibility.
Let me know how it goes!