Hi everyone
I am quite new to alteryx and I have an existing workflow (built by someone else) which uses regex_replace in a couple of instances. Even though I viewed the Regex 101 and the Regex Coach, I still fail to grasp what the individual characters do and, most importantly, how to fix errors which occur sometimes.
Long story short, I have a couple of fields which contain city and country for different events (one event can have more than once city and therefore more than one country as well). I am trying to concatenate them to get all the cities together, and all the countries together, the separator used is ;
The formula used is trim(regex_replace([_CurrentField_],"\b(.+);\s(?=.*\b\1;?)",""))
Errors which I found:
1. Mexico City;Panama City
They get concatenated to MexicoPanama City (so I assume the "City" part of Mexico is omitted as repetitive)
2. cities: Edinburgh; Fort Lauderdale; Hong Kong; London; New York; San Mateo; Singapore; St. Petersburg; Toronto
countries: Hong Kong; United KingdomSingapore; United States; Canada
As you can see, United Kingdom and Sinapore are missing a space in between. I guess this could be because of different cities from the same country (which I do not want repeated) but I still cannot fathom why it glues together UK and Singapore, and not any of the others.
Could someone help me translate this to layman's terms? Thank you!
I solemnly swear to analyse the answers and come out of this as a smarter human being.
Solved! Go to Solution.
Hi @pociechtoa,
I couldn't replicate the United KingdomSingapore issue but I did get the same result for Mexico City; Panama City
It seems like it's intended to remove duplication. I don't think it needs the parts in red, although it might depend on the data. It seems to work if you remove them though (at least for the Mexico City; Panama City example)
\b(.+);\s(?=.*\b\1;?)
To my understanding (which may not be quite right) it says -
\b - starting from a word boundary
( - start new marked group - i.e. something we want to be able to reference
.+ - match one or more of any kind of character
) - close marked group - we can now reference this group later using \1 (1 as it's the first group we've made)
;\s - the match one or more characters part will continue until it finds a ; followed by a space
(?= - this part is a positive look ahead - it does another match but doesn't include it in the result
.*\b - .* means match zero or more of any character, then a boundary
\1; - is refering back to our original group followed by ; - so does it find "City" again in the Panama example.
? - the final question mark means match as few characters as possible (within this lookahead group).
) - closes the look ahead.
Hopefully this helps a bit,
Regards,
Ben
Hi @pociechtoa
Sorry for being late to the party. @Ben_H has already done a great job in the explanation. This link https://regexr.com/ might be helpful to you and also you play around to understand more. You can type your regex and it tells you whats happening.
- At side there is a cheatsheet about regex
- At the top you can enter your regex
- At bottom its giving explanation of your regex.
If you provide more data and expected output we can take a look into it and help you in arriving to the solution 🙂
Hope this helps : )
Hi @pociechtoa
Can you provide us an expected output. That way we can quickly think of logic give you a solution.
Hi @pociechtoa
I got the requirement you want to only single occurrence.
Apporach -1: Using regex
But its still facing the city being removed issue.
Apporach -2: Summarize tool method
This was able to over come city issue.
Hope this helps : )
Whoa, atcodedog05, that was fast! Thanks! I will mull over this 👩💻
Happy to help : ) @pociechtoa
Cheers and have a nice day!