Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Another Regex Replace question

AndrewW
11 - Bolide

I have a regex question. I have many strings, some of which contain a varying number of <> tags embedded in the text. One example is (italics added by me):

 

<span style="">D</span><span style="font-family: sans-serif;">uring the last portoflio review we. </span>  <br>  <span style="font-family: sans-serif;"> The term loan we own for (TLB-6 USD 2 purchased at

 

I'd like all of the <> including the text between removing. That would leave the above text as:

 

During the last portoflio review we. The term loan we own for (TLB-6 USD 2 purchased at

 

Some of the strings have no <> tags, others have quite a lot. Any solution would need to be able to manage this inconsistency in this free text field.

 

Thanks!

4 REPLIES 4
MarqueeCrew
20 - Arcturus
20 - Arcturus
In your example, all of the good data comes after the tags. Did you know that you can handle this with string functions?

One way is to use reversestring() and replacefirst() as reversestring(replacefirst(reversestring([field]),">",'|'))

Then you could use a text to columns and parse the data.

Instead of replacefirst, you could use findstring to find the first (last) > sign and then use that value to take the right() most values.

Regex_replace([field],".*>(.*)",'$1')

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
estherb47
15 - Aurora
15 - Aurora

Hi @AndrewW 

A formula tool with a simple REGEX_Replace seems to be working here. Try it out on your data and let me know if it works for you.

 

REGEX_Replace([Field1], "<.*?>", "")

 

This tells Alteryx to remove everything contained in the <>, no matter how many there are. The ? keeps the text to be removed bound within the <>

 

If it doesn't work, then I'd recommend adding a Record ID tool to mark each row with an identifier. Then a Regex Parse, tokenizing into rows, using this as the expression: (^|.*?)(?:<.?>). Follow it with a Summarize tool, where you group by the record ID and concatenate the text back together with no delimiter


Cheers!

Esther

AndrewW
11 - Bolide

Thanks @estherb47 , I hoped there would be a simple regex solution for this :-)

AndrewW
11 - Bolide

Thanks for the suggestion Mark, sadly doesn't work for all scenarios, but appreciate the response

Labels