REGEX QUESTION
I have a TITLE field. It may or may not include an acronym. It may include more than one acronym. The acronym is always contained between an opening and closing parentheses.
ID | EXAMPLE TITLE | CLEANED TITLE | ACRONYMS |
1 | C-27 | C-27 | |
2 | C-27 (JCA) | C-27 | JCA |
3 | C-XX (MED RANGE) AIRCRAFT | C-XX AIRCRAFT | MED RANGE |
4 | AERIAL COMMON SENSOR (ACS) (MIP) | AERIAL COMMON SENSOR | ACS, MIP |
The expression ^[^\(]*\s(\([A-Z].*\)).* works for examples 2 and 3. The expression ^[^\(]*\s(\([A-Z].*\))\s(\(.*$) works for 4.
GIVEN AN ACRONYM WILL ALWAYS APPEAR BETWEEN PARENTHESES, HOW DO I CREATE A SINGLE EXPRESSION THAT CAPTURES EACH OCCURRENCE OF AN ACRONYM & THEN REMOVES THEM?
Thanks
Solved! Go to Solution.
Hi @hellyars
This works, in the RegEx tool, using replace and putting nothing in the replace.
(\([ A-Z]*\))
The parentheses on the outside signify a marked group. Everything inside them is just the expression you described. The replacement is replacing these "marked groups" with nothing. PS, in a formula, this would be:
REGEX_Replace([EXAMPLE TITLE], "(\([ A-Z]*\))", "")
Hope that helps
John
What John said for the Cleaned Title field
If you want to capture all acronyms you could use the regex tool and tokenize
\(([A-Z]*)\)
with the # of Columns being the max number of Acronyms you would expect, and then just concatenate these fields.
Hi,
For this case I see a solution by creating 3 formulas:
° CLEANED TITLE
if Contains([EXAMPLE TITLE], '(') then
trim(Substring([EXAMPLE TITLE],0,FindString([EXAMPLE TITLE],'(')))
else [EXAMPLE TITLE]
endif
2° PREACRONYMS
if Contains([EXAMPLE TITLE], '(') then
trim(Substring([EXAMPLE TITLE],FindString([EXAMPLE TITLE],'(')))
else [EXAMPLE TITLE]
endif
3° ACRONYMS
if Contains([PREACRONYMS], ') (') then
Replace(Replace(Replace([PREACRONYMS], ') (', ','), '(', ''), ')', '')
elseif Contains([PREACRONYMS], '(') then Replace(Replace([PREACRONYMS], '(', ''), ')', '')
else [PREACRONYMS]
endif
Hi @hellyars
Just in case there are characters other than alpha inside the parenthesis, please try this formula in a formula tool:
REGEX_Replace([EXAMPLE TITLE], "\s*\(.+?\)", "")
This takes everything between parentheses, and replaces it with nothing. As this will result in extra spaces, I added a \s* in case there are spaces before the open parenthesis.
It doesn't matter where the parentheses are in the title. Plus the cool thing about Regex_Replace is it will replace ALL instances within a single field.
If there are no parentheses, the formula won't do anything.
Let me know if that helps!
Cheers,
Esther
@JReid This is close enough. It's simple. It pulls all the acronyms (0-n) within a field. I just need to concat the outputs and pair with a second expression to remove the acronyms from the original field.
Thanks (everyone)