Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Having Trouble Parsing Text

hellyars
13 - Pulsar

Parse question...  I am trying to capture text from between > < tags.  I am using the expression >([^<]+)   I also tried >(.*?)<     Both work on Regex101, but neither approach worked with Alteryx.  Any ideas?  Workflow example attached. 

 

Thanks. 

 

Why is it so cumbersome to take a screenshot in Windows.  Why can't it be as simple as **bleep**, Cmd, 3 or 4?  Thank you Alteryx for forcing me to buy a PC (yuck).                                      

 

 

alteryx_screenshot_qstn.jpg

7 REPLIES 7
KarolinaRoza
11 - Bolide

Hi,

 

I think you are having issue due to white spaces between "> <", so Regex return it.

Seems that your formula works in lines where you do not have this extra signs "> <".

 

I am not sure if it meets your expectation, but you can apply Data Cleaning to remove white spaces, but when you will also remove it from desired Regex output.. 

Let me know if it is fine, if not maybe someone else has better solution 🙂

 

Karolina

mceleavey
17 - Castor
17 - Castor

Hi @hellyars ,

 

 

I achieved this by using the >.*?< in combination with tokenise to rows. This allows you to split on each instance, easily clean up the data in a single column, then pivot back:

 

mceleavey_0-1620894940070.png

 

mceleavey_1-1620894951594.png

 

Hope this helps.

 

M.



Bulien

TheOC
15 - Aurora
15 - Aurora

I gave this a try and got half way before my head hurt with Regex....
Well played @mceleavey !


Also @hellyars, windows snipping tool shortcut is Windows Key, Shift, and S. This will give a very similar tool to the one you will be used to from command, shift, 4, but it copies to the clipboard rather than as an image on the desktop. Might make things easier for you adjusting to the better OS 😉


Bulien
mceleavey
17 - Castor
17 - Castor

@TheOC ,

 

Regex is my friend.



Bulien

hellyars
13 - Pulsar

@mceleavey  1) I did not even think of Tokenize, because... 2) It was too late and (the real reason) I did not know it could split to rows - the default is split to columns and I never noticed the split to rows option just below it. (The curse of 5K screens and tiny fonts).  Maybe it should not default to either one and force  you to see and pick. 

 

Thanks!!

mceleavey
17 - Castor
17 - Castor

Hi @hellyars ,

 

no problem. I use Tokenise a lot and I find it really useful for this sort of thing. In my mind it's always a competition between text to columns and tokenise 🙂

 

Glad I could help.

 

M.



Bulien

hellyars
13 - Pulsar

@mceleaveyThank you again!.  This is saves a few steps.  The current HTML download comes in as a single cell hot mess.  I then use a Formula tool to insert a pipe delimiter paired with a Text to Columns tool to split the HTML into rows before trying to extract the relevant information.  I can use this simple Regex/Tokenize/Split-to-Rows approach to avoid all that pain.

 

 

Labels