Alteryx Designer Desktop Discussions

akshatkumar87 · ‎10-22-2018

Hi all I have a simple problem. I have a text column which has values like below

Text

-------------------------------

30% off HG Baselayer
2 for $30 Graphic T's
BOGO 50% Graphic T's
30% off Velocity 1/4 Zip
$19.99 ColdGear Baselayer Tops
40% off Mo'Money Short

40% off

Graphic T's

I need to split this text into something like

Column 1 Column 2

30% off HG Baselayer

BOGO 50% Graphic T's

2 for $30 Graphic T's

30% off Velocity 1/4 Zip

$19.99 ColdGear Baselayer Tops

40% off null

null Graphic T's

The rule which i am looking to implement is

Split the text if you encounter 'off' or any $/d (number starting with dollar sign) also if none is there col 1 is null but still col 2 can be not null

Any help is much appreciated

Thank you

KaneG · ‎10-22-2018

Hi @akshatkumar87,

You could try to write one big REGEX string for this, but it would be easier to split the lines up depending on the obvious differences. A huge advantage of Alteryx is being able to split data using the Filter tool and deal with it separately before bringing it back together.

So, that being said, the different categories I see are:

x%/$x off product
Data with no numbers
The rest

So, I would start with a couple of filters to separate the above into 3 datasets.

then dealing with the first stream is via a Parse REGEX option (as you have 2 named ranges, so Tokenize won't work).

([\d%$]+ off)(.*?)$

In the first named range, you are looking for any combination of digits, or symbols ($,%) followed by a space then the word off

In the second named range, you are looking for everything else before the end of the line (symbolised by the $)

The second stream will be anything that has a digit in it and as it's so inconsistent I would look for combinations of 3 or more numbers, symbols etc and then add a delimiter after that to split on

Replace: ([\%\$\d\.]{3,} ) with $1|

which will add a pipe after each combination that is found.

Everything in the square brackets are the search terms
The part in the squiggly brackets means 3 or more.
$1 means everything in the normal brackets; first marked group

The rest should be pretty simple to deal with

jdunkerley79 · ‎10-23-2018

^(.*?(\$[0-9\.]+|[0-9\.]+%)( off)?)? ?(.*?)$

This looks for a number block starting with a $ or ending with a %. There can be text before (e.g. BOGO of 2 for) and there can be an ' off' afterwards. Breaking it down:

^ - start of text
.*? - non greedy match of text at beginning
(\$[0-9\.]+|[0-9\.]+%) - match number with either a $ at start or % at end
( off)? - match off if present

You get a couple of extra matching blocks to delete bu can do this with a select

Sample attached

PaulN · ‎10-23-2018

Great answers @jdunkerley79 and @KaneG,

I would humbly suggest the following RegEx to avoid the select tool:

^(.*?(?:\$[.0-9]+|[.0-9]+%)(?: off)?)?\s?(.*?)$

It's completely borrowed from James' solution -so all credits go to him honestly- but it doesn't keep track of unneeded groups (through ?:) like

(\$[.0-9]+|[.0-9]+%)

or

( off)

meaning that only 2 columns are left as requested.

Thanks,

Paul Noirel

Sr Customer Support Engineer, Alteryx

jdunkerley79 · ‎10-23-2018

@PaulN nice enhancement - I always forget about those!

KaneG · ‎10-23-2018

Nice work to @jdunkerley79 & @PaulN, awesomely concise. The power of REGEX!!!

That gives a few options, whether you want it explained and easier for someone else to edit (taking the performance penalty) or whether you would like it nice, simple and efficient with minimal tools. Testament to the community and how there is multiple ways to solve every problem, but there is always a way in Alteryx.

MarqueeCrew · ‎10-23-2018

@KaneG + @jdunkerley79 + @PaulN,

Those were some consumate RegEx recommendations. But I also feel that they were egregious in that they demand RegEx expertise. The following single formula applies the rules necessary to parse the incoming text following these rules:

If you see "OFF", take the left most characters (including Off) as the first token and everything following as the second.
- Otherwise, if you see a $ symbol then get everything up to the next space as the first token and everything following that space as the second.
  - Otherwise, if you see a % symbol then take the left most characters (including the %) as the first token and Trim anything following as the second.
    - Lastly make the whole text the second.

IF
	Contains([Text],"off")	THEN Trim(Replace([Text], "off", 'off|'))	ELSEIF
	Contains([Text],"$")	THEN //Find first space after $ symbol
		Left([Text],ToNumber(FindString(Substring([Text],FindString([Text], "$"))," "))+
		ToNumber(FindString([Text], "$"))) + "|" +
		Substring([Text],ToNumber(FindString(Substring([Text],FindString([Text], "$")),
			" "))+1)						ELSEIF
	Contains([Text],"% ")	THEN Replace([Text],"% ","%|")
Else    "|"+[Text]
ENDIF

Then you can split the Output using "|" as a delimiter.

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

akshatkumar87 · ‎10-23-2018

Thank you so much for the solution it worked out pretty well except few exceptions where the text has stuff like

19.99 for Graphic T or $34.99/$45.99 for graphic T which i figured. All the solutions are right so thank you everyone

Alteryx Designer Desktop Discussions

Regex Tokenize | RegEx split