Hello Team,
I would like split the column below based on Pipe only so if there is "space | space" I would like to keep it as only one column.
Below an example
input
C1|C2|C3
A|"A | b | C"|D
Output
c1 | C2 | C3 |
A | A | b | C | D |
I'm curose to see how we can do this using regex :)
Thanks in advance :)
Regards,
Solved! Go to Solution.
Hi @messi007,
Not the most elegant solution but please see the example workflow below.
I replaced the "space | space" with anither special charater usig regex then parsed on the "|".
Hope this helps.
Dan
Given your input data, you can achieve this very simply with the text to columns tool. Just check the option to ignore delimiters in quotes. This doesn't remove the quotes, but those would be easy to remove in a later step.
If the data is in an input file, you can use the CSV import functionality to Ignore delimiters in quotes.
If the data is part of a stream within your workflow, this RegEx discussion may help:
https://stackoverflow.com/questions/6462578/regex-to-match-all-instances-not-inside-quotes
Chris
@Christina_H's approach is the one I would recommend - but as a learning exercise I tried to find the RegEx which would tokenise just what you want.
This is what I came up with:
"?(.*?\S)(?:"?\||$)
So taking everything after a quote (if it's there) until it finds a pipe which doesn't have a space (and optionally a quote) immediately before it or it gets to the end of the string
Dear, nice done with Non-Capturing Groups and split by ignore in quote. 👍
Dear Alteryx Team,
All solutions works but I really appreciate @OllieClarke's solution :)
Thanks for your help !
Regards,
@messi007 as I say, @Christina_H's solution will be more performant and robust - so could you mark it as a solution. The RegEx option was just as a learning exercise.
@OllieClarke as you can see on the question. From the beginning I was looking for the RegEx option. However why @Christina_H's solution is more performant then @ChrisTX or your solution.
@messi007 so RegEx is a computationally intensive function which means that it's slow. If I test the two approaches (after some duplication so we don't just have 2 rows of data) we can see that 54% of the workflow runtime was the RegEx tool, whereas 38% was the text to columns + multi-field formula approach. The outcomes of these 2 approaches is identical.
As @ChrisTX is basically the same approach as @Christina_H but with input I've grouped it with @Christina_H's
Hope that helps,
Ollie