Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

REGEX - How to capture each occurrence of an acronym that may appear in a text field?

hellyars
13 - Pulsar

REGEX QUESTION

 

I have a TITLE field.  It may or may not include an acronym.   It may include more than one acronym.  The acronym is always contained between an opening and closing parentheses.  

 

IDEXAMPLE TITLECLEANED TITLEACRONYMS

1

C-27

C-27 
2C-27 (JCA)C-27JCA
3C-XX (MED RANGE) AIRCRAFTC-XX AIRCRAFTMED RANGE
4AERIAL COMMON SENSOR (ACS) (MIP)AERIAL COMMON SENSORACS, MIP

 

The expression ^[^\(]*\s(\([A-Z].*\)).* works for examples 2 and 3.  The expression ^[^\(]*\s(\([A-Z].*\))\s(\(.*$) works for 4.  

 

GIVEN AN ACRONYM WILL ALWAYS APPEAR BETWEEN PARENTHESES, HOW DO I CREATE A SINGLE EXPRESSION THAT CAPTURES EACH OCCURRENCE OF AN ACRONYM & THEN REMOVES THEM?   

 

Thanks

 

5 REPLIES 5
JohnJPS
15 - Aurora

Hi @hellyars 

 

This works, in the RegEx tool, using replace and putting nothing in the replace.

 

(\([ A-Z]*\))

 

The parentheses on the outside signify a marked group. Everything inside them is just the expression you described. The replacement is replacing these "marked groups" with nothing.  PS, in a formula, this would be:

REGEX_Replace([EXAMPLE TITLE], "(\([ A-Z]*\))", "")

Hope that helps

John

JReid
9 - Comet

What John said for the Cleaned Title field

 

If you want to capture all acronyms you could use the regex tool and tokenize 

\(([A-Z]*)\)

with the # of Columns being the max number of Acronyms you would expect, and then just concatenate these fields.

geraldo
13 - Pulsar

Hi,

 

For this case I see a solution by creating 3 formulas:

 

° CLEANED TITLE
if Contains([EXAMPLE TITLE], '(') then
trim(Substring([EXAMPLE TITLE],0,FindString([EXAMPLE TITLE],'(')))
else [EXAMPLE TITLE]
endif

2° PREACRONYMS
if Contains([EXAMPLE TITLE], '(') then
trim(Substring([EXAMPLE TITLE],FindString([EXAMPLE TITLE],'(')))
else [EXAMPLE TITLE]
endif

3° ACRONYMS
if Contains([PREACRONYMS], ') (') then
Replace(Replace(Replace([PREACRONYMS], ') (', ','), '(', ''), ')', '')
elseif Contains([PREACRONYMS], '(') then Replace(Replace([PREACRONYMS], '(', ''), ')', '')
else [PREACRONYMS]
endif

estherb47
15 - Aurora
15 - Aurora

Hi @hellyars 

Just in case there are characters other than alpha inside the parenthesis, please try this formula in a formula tool:

 

REGEX_Replace([EXAMPLE TITLE], "\s*\(.+?\)", "")

 

This takes everything between parentheses, and replaces it with nothing. As this will result in extra spaces, I added a \s* in case there are spaces before the open parenthesis.

 

It doesn't matter where the parentheses are in the title. Plus the cool thing about Regex_Replace is it will replace ALL instances within a single field.

 

If there are no parentheses, the formula won't do anything.

 

Let me know if that helps!

 

Cheers,

Esther

hellyars
13 - Pulsar

@JReid  This is close enough.  It's simple.  It pulls all the acronyms (0-n) within a field.  I just need to concat the outputs and pair with a second expression to remove the acronyms from the original field.  

 

Thanks (everyone)

Labels