Hi everyone,
I have a field called CountryData and want to be able to extract the corresponding countries from these strings. Here are some examples of the data:
xxxx CA xxxx
xxxxxCA
xxxx(CA)xxxx
xxxx Canada xxxx
xxxxx CAN xxxx
I decided to use the RegEx tool to parse the country codes and then do a lookup using http://www.nationsonline.org/oneworld/country_code_list.htm:
\W([[:alpha:]][[:alpha:]])\W
Abcd.co.uk (ca)
My expression here picks up co, uk and ca
Thanks!
Solved! Go to Solution.
If your data is too dissimilar, Regex may not work as your solution. If you believe that the country code will be present late in the field, using a ReverseString() may be the easiest way to get those characters into the parse. Otherwise, to handle both 2 and 3 digit codes, you could create one list with all the possibly codes and their equivalent countries and join.
Thanks pcatterson!
The code can appear anywhere in the string with equal chance.
The RegEx you provided is a bit too indiscriminate and is capturing whole words.
For example:
xxx xxx CA xxx xx
If I provide a list of counties/2 and 3 iso code and corresponding countries is there a way I can search my strings if these countries or codes appear within and if so allocate a code?
For example my list is:
Within String, Corresponding Country
CA, Canada
CAN, Canada,
Canada, Canada
And if the 'Within String' column is matched with something in the text then allocate the country? Same rules need to apply, there should be a \W character before and after the portion of interest
I compiled a master list and did a join. I'm facing a match issue
xxxxxx CANADA does not match for some reason with \W([[:alpha:]]+)\W
@Amy_C here's my take on it - I can't think of a way to get xxxx CA xxxx type values to match Abcd.co.uk (ca), so I did a 2 part workflow. First part addresses the Regex, and then how to match back to get a "Country" field (you could probably do it more elegantly with a series of Joins).
Here's my RegEx
^[x\W]+(\w+)[x\W]*$
For the Abcd.co.uk (ca) value (and any other potential values that don't match the RegEx), I did a Find and Replace with the option to "Replace Multiple Found Items", then Joined back. This will allow you to get a list of matches so that you can then decide which works best.
Results:
Abcd.co.uk (ca) | Colombia |
Abcd.co.uk (ca) | Canada |