Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex replace cases - translation help

pociechtoa
5 - Atom

Hi everyone

 

I am quite new to alteryx and I have an existing workflow (built by someone else) which uses regex_replace in a couple of instances. Even though I viewed the Regex 101 and the Regex Coach, I still fail to grasp what the individual characters do and, most importantly, how to fix errors which occur sometimes. 

 

Long story short, I have a couple of fields which contain city and country for different events (one event can have more than once city and therefore more than one country as well). I am trying to concatenate them to get all the cities together, and all the countries together, the separator used is ;  

 

The formula used is trim(regex_replace([_CurrentField_],"\b(.+);\s(?=.*\b\1;?)","")) 

 

Errors which I found: 

 

1. Mexico City;Panama City 

They get concatenated to MexicoPanama City (so I assume the "City" part of Mexico is omitted as repetitive)

 

2. cities:  Edinburgh; Fort Lauderdale; Hong Kong; London; New York; San Mateo; Singapore; St. Petersburg; Toronto

countries: Hong Kong; United KingdomSingapore; United States; Canada

As you can see, United Kingdom and Sinapore are missing a space in between. I guess this could be because of different cities from the same country (which I do not want repeated) but I still cannot fathom why it glues together UK and Singapore, and not any of the others. 

 

Could someone help me translate this to layman's terms? Thank you!

I solemnly swear to analyse the answers and come out of this as a smarter human being. 

8 REPLIES 8
Jean-Balteryx
16 - Nebula
16 - Nebula

Hi @pociechtoa ,

 

Can you share your workflow ?

Ben_H
11 - Bolide

Hi @pociechtoa,

 

I couldn't replicate the United KingdomSingapore issue but I did get the same result for Mexico City; Panama City

 

It seems like it's intended to remove duplication. I don't think it needs the parts in red, although it might depend on the data. It seems to work if you remove them though (at least for the Mexico City; Panama City example)

 

\b(.+);\s(?=.*\b\1;?)

 

To my understanding (which may not be quite right) it says -

 

\b     - starting from a word boundary

(       - start new marked group - i.e. something we want to be able to reference

.+      - match one or more of any kind of character

)        - close marked group - we can now reference this group later using \1 (1 as it's the first group we've made)

;\s      - the match one or more characters part will continue until it finds a ; followed by a space

(?=    - this part is a positive look ahead - it does another match but doesn't include it in the result

.*\b    - .* means match zero or more of any character, then a boundary

\1;     - is refering back to our original group followed by ; - so does it find "City" again in the Panama example.

?       - the final question mark means match as few characters as possible (within this lookahead group).

)       - closes the look ahead.

 

Hopefully this helps a bit,

 

Regards,

 

Ben

 

atcodedog05
22 - Nova
22 - Nova

Hi @pociechtoa 

 

Sorry for being late to the party. @Ben_H has already done a great job in the explanation. This link https://regexr.com/  might be helpful to you and also you play around to understand more. You can type your regex and it tells you whats happening.

 

atcodedog05_0-1628788170903.png

 

- At side there is a cheatsheet about regex

- At the top you can enter your regex

- At bottom its giving explanation of your regex.

 

If you provide more data and expected output we can take a look into it and help you in arriving to the solution 🙂

 

Hope this helps : )

pociechtoa
5 - Atom

Hi Jean-Balteryx, atcodedog05 and Ben_H!

 

Thank you guys for the help so far. I'm attaching an excerpt of my workflow, maybe this will shed some more light on the issue. I included the troublesome examples and one or two that work as well. 

atcodedog05
22 - Nova
22 - Nova

Hi @pociechtoa 

 

Can you provide us an expected output. That way we can quickly think of logic give you a solution.

atcodedog05
22 - Nova
22 - Nova

Hi @pociechtoa 

 

I got the requirement you want to only single occurrence.

 

Apporach -1: Using regex

atcodedog05_0-1628840784813.png

But its still facing the city being removed issue.

 

Apporach -2: Summarize tool method

atcodedog05_1-1628840874840.png

 

This was able to over come city issue.

 

Hope this helps : )

pociechtoa
5 - Atom

Whoa, atcodedog05, that was fast! Thanks! I will mull over this 👩‍💻

atcodedog05
22 - Nova
22 - Nova

Happy to help : ) @pociechtoa 

Cheers and have a nice day!

Labels