Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Stripping away parts of text (reg ex?)

MsBindy
8 - Asteroid

I have a large file that has one field of text.  The text includes a description.  part of that description includes someones name and ID number.  I'm trying to remove that information from the description to result in a generic description.

 

I think RegEx is the answer,  but I'm finding this a little too complicated for me.

 

Here's a sample of what some of my data looks like and what my desired stripped description would be

Workflow 801 Initiated by MS BINDY (P0084430) Template Auto Selected.

Workflow Initiated Template Auto Selected

 

Edits for Preliminary Bill created by MS BINDY (P002294C)

Edits for Preliminary Bill created

 

 

Review/Approval Request Email sent to Assigned Approver MS BINDY (P0018670)

Review/Approval Request Email sent

 

I'm trying to remove the text after either of the words (to|by)  up through and including the close paren after the ID.  (I'm also going to want to remove the numeric after the word "Workflow" if it exists.) I substituted Ms Bindy for the name...it's not always going to be me.

 

My approach so far is to use Reg Ex with named and unnamed groups,  but I'm not having great success.

 

 

 

2 REPLIES 2
JohnJPS
15 - Aurora

Hi @MsBindy,

I also used groups, but rather than try to do it all at once, I used a pair of Regex_REPLACE:

REGEX_Replace(
  REGEX_Replace([fld1], "(.*\<workflow\>) [0-9]+(.*)", "$1$2"),
  "(.*) (\<to\>|\<by\>) (.*)(\))(.*)", "$1$5")

The inner one removes the numeric if it's there; the outer handles the to/by piece.

Hope that helps!

John

 

MsBindy
8 - Asteroid

That's great!  I'll have to study that this afternoon.  I ran it though my data and have to make a few tweaks.    The part where the number after the word 'workflow' is stripped out can be a number with a comma in it, and the replace didn't handle that quite right.  And I found another place in some of the data where I need to strip something out.

 

But,  this gets me 95% there,  and you saved me at least a day of trial and error!  Thank you!

Labels