Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex replace cases - translation help

pociechtoa
5 - Atom

Hi everyone

 

I am quite new to alteryx and I have an existing workflow (built by someone else) which uses regex_replace in a couple of instances. Even though I viewed the Regex 101 and the Regex Coach, I still fail to grasp what the individual characters do and, most importantly, how to fix errors which occur sometimes. 

 

Long story short, I have a couple of fields which contain city and country for different events (one event can have more than once city and therefore more than one country as well). I am trying to concatenate them to get all the cities together, and all the countries together, the separator used is ;  

 

The formula used is trim(regex_replace([_CurrentField_],"\b(.+);\s(?=.*\b\1;?)","")) 

 

Errors which I found: 

 

1. Mexico City;Panama City 

They get concatenated to MexicoPanama City (so I assume the "City" part of Mexico is omitted as repetitive)

 

2. cities:  Edinburgh; Fort Lauderdale; Hong Kong; London; New York; San Mateo; Singapore; St. Petersburg; Toronto

countries: Hong Kong; United KingdomSingapore; United States; Canada

As you can see, United Kingdom and Sinapore are missing a space in between. I guess this could be because of different cities from the same country (which I do not want repeated) but I still cannot fathom why it glues together UK and Singapore, and not any of the others. 

 

Could someone help me translate this to layman's terms? Thank you!

I solemnly swear to analyse the answers and come out of this as a smarter human being. 

8 REPLIES 8
Jean-Balteryx
16 - Nebula
16 - Nebula

Hi @pociechtoa ,

 

Can you share your workflow ?

Ben_H
11 - Bolide

Hi @pociechtoa,

 

I couldn't replicate the United KingdomSingapore issue but I did get the same result for Mexico City; Panama City

 

It seems like it's intended to remove duplication. I don't think it needs the parts in red, although it might depend on the data. It seems to work if you remove them though (at least for the Mexico City; Panama City example)

 

\b(.+);\s(?=.*\b\1;?)

 

To my understanding (which may not be quite right) it says -

 

\b     - starting from a word boundary

(       - start new marked group - i.e. something we want to be able to reference

.+      - match one or more of any kind of character

)        - close marked group - we can now reference this group later using \1 (1 as it's the first group we've made)

;\s      - the match one or more characters part will continue until it finds a ; followed by a space

(?=    - this part is a positive look ahead - it does another match but doesn't include it in the result

.*\b    - .* means match zero or more of any character, then a boundary

\1;     - is refering back to our original group followed by ; - so does it find "City" again in the Panama example.

?       - the final question mark means match as few characters as possible (within this lookahead group).

)       - closes the look ahead.

 

Hopefully this helps a bit,

 

Regards,

 

Ben

 

atcodedog05
22 - Nova
22 - Nova

Hi @pociechtoa 

 

Sorry for being late to the party. @Ben_H has already done a great job in the explanation. This link https://regexr.com/  might be helpful to you and also you play around to understand more. You can type your regex and it tells you whats happening.

 

atcodedog05_0-1628788170903.png

 

- At side there is a cheatsheet about regex

- At the top you can enter your regex

- At bottom its giving explanation of your regex.

 

If you provide more data and expected output we can take a look into it and help you in arriving to the solution 🙂

 

Hope this helps : )

pociechtoa
5 - Atom

Hi Jean-Balteryx, atcodedog05 and Ben_H!

 

Thank you guys for the help so far. I'm attaching an excerpt of my workflow, maybe this will shed some more light on the issue. I included the troublesome examples and one or two that work as well. 

atcodedog05
22 - Nova
22 - Nova

Hi @pociechtoa 

 

Can you provide us an expected output. That way we can quickly think of logic give you a solution.

atcodedog05
22 - Nova
22 - Nova

Hi @pociechtoa 

 

I got the requirement you want to only single occurrence.

 

Apporach -1: Using regex

atcodedog05_0-1628840784813.png

But its still facing the city being removed issue.

 

Apporach -2: Summarize tool method

atcodedog05_1-1628840874840.png

 

This was able to over come city issue.

 

Hope this helps : )

pociechtoa
5 - Atom

Whoa, atcodedog05, that was fast! Thanks! I will mull over this 👩‍💻

atcodedog05
22 - Nova
22 - Nova

Happy to help : ) @pociechtoa 

Cheers and have a nice day!

Labels