I'm trying to use the RegEx tool and I only want text between <p> and </p>
This equation will remove everything up to and including <p>
^[^_]*<p>
However, I still get </p> and all the other junk after that I don't want. Can anyone help?
Solved! Go to Solution.
In case you don't want to go the regex route :)
Substring([YourData],
FindString([YourData], '<p>')+3,
FindString([YourData], '</p>')-FindString([YourData], '<p>')-3)
Hi that gets rid of all of <p> and everything inside of it! I want to keep the stuff inside of it!
hmmmm sounds like you just need to change regex tool from replace to parse for my solution...
MSalvage
Alright that worked. Thank you! I was not aware of Parse for Regex. Where do I learn about that?
HI @Billbisco,
You can read more about Regex parsing here: https://help.alteryx.com/9.5/RegEx.htm
You could also have used the .*? for this task, which would take all text in between the two strings, whilst retaining the strings.
For example, from the following text:
"Regex is particularly useful for parsing chunks of text"
useful .*? chunks will return "useful for parsing chunks"
set the regex tool to tokenize
For me, the live training has been very helpful
https://community.alteryx.com/t5/Live-Training/Live-Training-Introduction-to-RegEx/m-p/66489#M116
Also, when I need a quick reference, I use the examples provided when you click on the RegEx icon on the toolbar