Parse question... I am trying to capture text from between > < tags. I am using the expression >([^<]+) I also tried >(.*?)< Both work on Regex101, but neither approach worked with Alteryx. Any ideas? Workflow example attached.
Thanks.
Why is it so cumbersome to take a screenshot in Windows. Why can't it be as simple as **bleep**, Cmd, 3 or 4? Thank you Alteryx for forcing me to buy a PC (yuck).
Solved! Go to Solution.
Hi,
I think you are having issue due to white spaces between "> <", so Regex return it.
Seems that your formula works in lines where you do not have this extra signs "> <".
I am not sure if it meets your expectation, but you can apply Data Cleaning to remove white spaces, but when you will also remove it from desired Regex output..
Let me know if it is fine, if not maybe someone else has better solution 🙂
Karolina
Hi @hellyars ,
I achieved this by using the >.*?< in combination with tokenise to rows. This allows you to split on each instance, easily clean up the data in a single column, then pivot back:
Hope this helps.
M.
I gave this a try and got half way before my head hurt with Regex....
Well played @mceleavey !
Also @hellyars, windows snipping tool shortcut is Windows Key, Shift, and S. This will give a very similar tool to the one you will be used to from command, shift, 4, but it copies to the clipboard rather than as an image on the desktop. Might make things easier for you adjusting to the better OS 😉
@mceleavey 1) I did not even think of Tokenize, because... 2) It was too late and (the real reason) I did not know it could split to rows - the default is split to columns and I never noticed the split to rows option just below it. (The curse of 5K screens and tiny fonts). Maybe it should not default to either one and force you to see and pick.
Thanks!!
Hi @hellyars ,
no problem. I use Tokenise a lot and I find it really useful for this sort of thing. In my mind it's always a competition between text to columns and tokenise 🙂
Glad I could help.
M.
@mceleaveyThank you again!. This is saves a few steps. The current HTML download comes in as a single cell hot mess. I then use a Formula tool to insert a pipe delimiter paired with a Text to Columns tool to split the HTML into rows before trying to extract the relevant information. I can use this simple Regex/Tokenize/Split-to-Rows approach to avoid all that pain.