I have a regex question. I have many strings, some of which contain a varying number of <> tags embedded in the text. One example is (italics added by me):
<span style="">D</span><span style="font-family: sans-serif;">uring the last portoflio review we. </span> <br> <span style="font-family: sans-serif;"> The term loan we own for (TLB-6 USD 2 purchased at
I'd like all of the <> including the text between removing. That would leave the above text as:
During the last portoflio review we. The term loan we own for (TLB-6 USD 2 purchased at
Some of the strings have no <> tags, others have quite a lot. Any solution would need to be able to manage this inconsistency in this free text field.
Thanks!
Solved! Go to Solution.
Hi @AndrewW
A formula tool with a simple REGEX_Replace seems to be working here. Try it out on your data and let me know if it works for you.
REGEX_Replace([Field1], "<.*?>", "")
This tells Alteryx to remove everything contained in the <>, no matter how many there are. The ? keeps the text to be removed bound within the <>
If it doesn't work, then I'd recommend adding a Record ID tool to mark each row with an identifier. Then a Regex Parse, tokenizing into rows, using this as the expression: (^|.*?)(?:<.?>). Follow it with a Summarize tool, where you group by the record ID and concatenate the text back together with no delimiter
Cheers!
Esther
Thanks @estherb47 , I hoped there would be a simple regex solution for this :-)
Thanks for the suggestion Mark, sadly doesn't work for all scenarios, but appreciate the response