Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

how to remove words between special symbols

Highlighted
Asteroid

I have cells that contain lots of special symbols as below

<p>Radial Root Cyclone™ technology</p>
</li></ul><p><b> </b></p><p><b> </b></p><ul><li>Root Cyclone™
<lt/>p<gt/>Announce your love for freshness to the world-or to everyone who visits your home

 

What I want is a clean text information like this, so I need to remove the special symbols<>and charters between those special symbols/

Radial Root Cyclone™ technolog
Root Cyclone™
Announce your love for freshness to the world-or to everyone who visits your home

 

What would be the best way to do this? 

Thank you.  

Highlighted
Magnetar

Hi,

 

If you have a specific set of these that you need to remove, you could use the Find and Replace tool with a lookup list of them and replace with an empty string.


Another option would be using a formula like the below:


REGEX_Replace([Field1], '<[^>]*>', '')

This would remove any text contained in carats (<>).  Note that this particular REGEX does not solve for the third example with an escaped lt and gt and a P inside of it.  To accomplish that you may want to include an additional rule for this sort of case.

 

EDIT:

An option that may resolve the LT GT "P" example is below:

REGEX_Replace([Field1], '<(lt.*gt/|[^>]*)>', '')

 

This Regex looks for text inside of an open and closed carat <> OR text starting with <lt and ending with gt/> and removes it.

 

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

@ll1100000,

 

I ran into the error that @Claje 's formula produces.  I decided to fix it with 2 formulas.  The first formula trims (gets rid of spaces) the results of the captured group:

 

TRIM(REGEX_Replace([Field1], "<.*?>", ''))

The second addresses the problem with the 3rd record where a "p" character is outside of the <> braces.

I took the OUTPUT variable and did this:

REGEX_Replace([output], "^\l(\u.*)", '$1',0)

It is case sensitive and will remove a starting letter (just one) followed by an uppercase letter.

 

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

Of course.... @Claje is the world's first Expert Certified Alteryx user.  I do like the idea of not using a RegEx expression if possible.

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Highlighted
Magnetar

And I still learn something new every day :).

Now I'm testing out the RegEX syntax @MarqueeCrew set up with the <.*?> so that I understand it better as I've never used the ? operator before.


The Trim is a really good idea and gets you to your exact goal.

Asteroid

@MarqueeCrew

Thank you for your reply, is there a way that I can understand how to use those symbols or what do they mean when using Regex function?  

 "^\l(\u.*)"

 

Labels