Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Can't figure out what is probably an easy RegEx parse

Amy_C
7 - Meteor

Hi everyone,

 

I have a field called CountryData and want to be able to extract the corresponding countries from these strings. Here are some examples of the data:

 

xxxx CA xxxx

xxxxxCA

xxxx(CA)xxxx

xxxx Canada xxxx

xxxxx CAN xxxx

 

I decided to use the RegEx tool to parse the country codes and then do a lookup using http://www.nationsonline.org/oneworld/country_code_list.htm:

 

\W([[:alpha:]][[:alpha:]])\W

 

  • However I don’t know how I would search for 2 OR 3 letter codes.
  • Another problem I had was in the following instance:

Abcd.co.uk (ca)

 

My expression here picks up co, uk and ca

 

  • I’d like to be able to view all of these output fields and manually select which is the correct one.
  • I don’t know how I would feed in a list of countries into this flow also as my data also contains country names as well as country codes.
  • I’d also like to be able to input a file where some fields have been manually assigned and include this in the final output overriding any expressions

Thanks!

6 REPLIES 6
pcatterson
11 - Bolide

If your data is too dissimilar, Regex may not work as your solution.  If you believe that the country code will be present late in the field, using a ReverseString() may be the easiest way to get those characters into the parse.  Otherwise, to handle both 2 and 3 digit codes, you could create one list with all the possibly codes and their equivalent countries and join.

Amy_C
7 - Meteor

Thanks pcatterson!

 

The code can appear anywhere in the string with equal chance.

 

The RegEx you provided is a bit too indiscriminate and is capturing whole words.

 

For example:

 

xxx xxx CA xxx xx

 

If I provide a list of counties/2 and 3 iso code and corresponding countries is there a way I can search my strings if these countries or codes appear within and if so allocate a code?

 

For example my list is:

 

Within String, Corresponding Country

CA, Canada

CAN, Canada,

Canada, Canada

 

And if the 'Within String' column is matched with something in the text then allocate the country? Same rules need to apply, there should be a \W character before and after the portion of interest

Amy_C
7 - Meteor

I compiled a master list and did a join. I'm facing a match issue

 

xxxxxx CANADA does not match for some reason with \W([[:alpha:]]+)\W

pcatterson
11 - Bolide

The trouble seems to be that there isn't a group after the marked group.  Therefore it doesn't parse it correctly.  You could add a random non-letter character to the end of the string so that it identifies it.

jrgo
14 - Magnetar

@Amy_C,

 

See if the attached helps get you in the right direction. The first RegEx tool is removing character strings where the same character is repeating 3 or more times. The second RegEx tool is simply removing characters until it hits a space (or \s).

 

Hope this helps!

 

JGo

SophiaF
Alteryx
Alteryx

@Amy_C here's my take on it - I can't think of a way to get xxxx CA xxxx type values to match Abcd.co.uk (ca), so I did a 2 part workflow. First part addresses the Regex, and then how to match back to get a "Country" field (you could probably do it more elegantly with a series of Joins).

 

Here's my RegEx

^[x\W]+(\w+)[x\W]*$

 

 

For the Abcd.co.uk (ca) value (and any other potential values that don't match the RegEx), I did a Find and Replace with the option to "Replace Multiple Found Items", then Joined back. This will allow you to get a list of matches so that you can then decide which works best.

 

Results:

Abcd.co.uk (ca)  Colombia
Abcd.co.uk (ca)  Canada

 

Sophia Fraticelli
Senior Solutions Architect
Alteryx, Inc.
Labels