Hi all I have a simple problem. I have a text column which has values like below
Text
-------------------------------
30% off HG Baselayer
2 for $30 Graphic T's
BOGO 50% Graphic T's
30% off Velocity 1/4 Zip
$19.99 ColdGear Baselayer Tops
40% off Mo'Money Short
40% off
Graphic T's
I need to split this text into something like
Column 1 Column 2
30% off HG Baselayer
BOGO 50% Graphic T's
2 for $30 Graphic T's
30% off Velocity 1/4 Zip
$19.99 ColdGear Baselayer Tops
40% off null
null Graphic T's
The rule which i am looking to implement is
Split the text if you encounter 'off' or any $/d (number starting with dollar sign) also if none is there col 1 is null but still col 2 can be not null
Any help is much appreciated
Thank you
Solved! Go to Solution.
Hi @akshatkumar87,
You could try to write one big REGEX string for this, but it would be easier to split the lines up depending on the obvious differences. A huge advantage of Alteryx is being able to split data using the Filter tool and deal with it separately before bringing it back together.
So, that being said, the different categories I see are:
So, I would start with a couple of filters to separate the above into 3 datasets.
then dealing with the first stream is via a Parse REGEX option (as you have 2 named ranges, so Tokenize won't work).
In the first named range, you are looking for any combination of digits, or symbols ($,%) followed by a space then the word off
In the second named range, you are looking for everything else before the end of the line (symbolised by the $)
The second stream will be anything that has a digit in it and as it's so inconsistent I would look for combinations of 3 or more numbers, symbols etc and then add a delimiter after that to split on
which will add a pipe after each combination that is found.
The rest should be pretty simple to deal with
^(.*?(\$[0-9\.]+|[0-9\.]+%)( off)?)? ?(.*?)$
This looks for a number block starting with a $ or ending with a %. There can be text before (e.g. BOGO of 2 for) and there can be an ' off' afterwards. Breaking it down:
You get a couple of extra matching blocks to delete bu can do this with a select
Sample attached
Great answers @jdunkerley79 and @KaneG,
I would humbly suggest the following RegEx to avoid the select tool:
^(.*?(?:\$[.0-9]+|[.0-9]+%)(?: off)?)?\s?(.*?)$
It's completely borrowed from James' solution -so all credits go to him honestly- but it doesn't keep track of unneeded groups (through ?:) like
(\$[.0-9]+|[.0-9]+%)
or
( off)
meaning that only 2 columns are left as requested.
Thanks,
Paul Noirel
Sr Customer Support Engineer, Alteryx
@PaulN nice enhancement - I always forget about those!
Nice work to @jdunkerley79 & @PaulN, awesomely concise. The power of REGEX!!!
That gives a few options, whether you want it explained and easier for someone else to edit (taking the performance penalty) or whether you would like it nice, simple and efficient with minimal tools. Testament to the community and how there is multiple ways to solve every problem, but there is always a way in Alteryx.
@KaneG + @jdunkerley79 + @PaulN,
Those were some consumate RegEx recommendations. But I also feel that they were egregious in that they demand RegEx expertise. The following single formula applies the rules necessary to parse the incoming text following these rules:
IF Contains([Text],"off") THEN Trim(Replace([Text], "off", 'off|')) ELSEIF Contains([Text],"$") THEN //Find first space after $ symbol Left([Text],ToNumber(FindString(Substring([Text],FindString([Text], "$"))," "))+ ToNumber(FindString([Text], "$"))) + "|" + Substring([Text],ToNumber(FindString(Substring([Text],FindString([Text], "$")), " "))+1) ELSEIF Contains([Text],"% ") THEN Replace([Text],"% ","%|") Else "|"+[Text] ENDIF
Then you can split the Output using "|" as a delimiter.
Cheers,
Mark
Thank you so much for the solution it worked out pretty well except few exceptions where the text has stuff like
19.99 for Graphic T or $34.99/$45.99 for graphic T which i figured. All the solutions are right so thank you everyone