Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Split based on pipe |

messi007
15 - Aurora
15 - Aurora

Hello Team,

 

I would like split the column below based on Pipe only so if there is "space | space" I would like to keep it as only one column.

Below an example

 

input 

 

C1|C2|C3

A|"A | b | C"|D

 

Output

c1 C2 C3
A A | b | C D

 

I'm curose to see how we can do this using regex :)

 

Thanks in advance :)

 

Regards,

9 REPLIES 9
DanielMS
Alteryx
Alteryx

Hi @messi007,

 

Not the most elegant solution but please see the example workflow below.

 

I replaced the "space | space" with anither special charater usig regex then parsed on the "|".

 

Hope this helps.

 

Dan

Christina_H
14 - Magnetar

Given your input data, you can achieve this very simply with the text to columns tool.  Just check the option to ignore delimiters in quotes.  This doesn't remove the quotes, but those would be easy to remove in a later step.

Christina_H_1-1663237175289.png

 

ChrisTX
15 - Aurora

If the data is in an input file, you can use the CSV import functionality to Ignore delimiters in quotes.

 

If the data is part of a stream within your workflow, this RegEx discussion may help:

https://stackoverflow.com/questions/6462578/regex-to-match-all-instances-not-inside-quotes

 

 

ChrisTX_0-1663236867159.png

 

Chris

OllieClarke
15 - Aurora
15 - Aurora

@Christina_H's approach is the one I would recommend - but as a learning exercise I tried to find the RegEx which would tokenise just what you want.

This is what I came up with:

"?(.*?\S)(?:"?\||$)

So taking everything after a quote (if it's there) until it finds a pipe which doesn't have a space (and optionally a quote) immediately before it or it gets to the end of the string

OllieClarke_0-1663256764986.png

 

flying008
14 - Magnetar

Hi, @OllieClarke  @Christina_H 

 

Dear, nice done with Non-Capturing Groups and split by ignore in quote. 👍

messi007
15 - Aurora
15 - Aurora

Dear Alteryx Team,

 

All solutions works but I really appreciate @OllieClarke's solution :)

 

Thanks for your help !

 

Regards,

OllieClarke
15 - Aurora
15 - Aurora

@messi007 as I say, @Christina_H's solution will be more performant and robust - so could you mark it as a solution. The RegEx option was just as a learning exercise.

messi007
15 - Aurora
15 - Aurora

@OllieClarke  as you can see on the question. From the beginning I was looking for the RegEx option. However why @Christina_H's solution is more performant then @ChrisTX or your solution.

OllieClarke
15 - Aurora
15 - Aurora

@messi007 so RegEx is a computationally intensive function which means that it's slow. If I test the two approaches (after some duplication so we don't just have 2 rows of data) we can see that 54% of the workflow runtime was the RegEx tool, whereas 38% was the text to columns + multi-field formula approach. The outcomes of these 2 approaches is identical. 

OllieClarke_0-1663332799725.png
As @ChrisTX is basically the same approach as @Christina_H  but with input I've grouped it with @Christina_H's

Hope that helps,

Ollie

 

 

Labels