Hi All,
I got a bunch text with characters and words I want to clean up , I have come up with a Regex_Replace formula, it seems to only do part of what I want ( it doesn't remove anything after specific word) ...could someone help trouble shoot this formula? Thanks
find below item and replace with blanks:
1. remove all after period (.)
2. remove all after semicolon (:)
3 remove all after certain word (Respect) or (Behalf)
formula below:
trim(REGEX_Replace(
(if Contains([Line Items Description], ".")
then Left([Line Items Description], FindString([Line Items Description], "."))
elseif Contains([Line Items Description],"-")
then Left([Line Items Description], FindString([Line Items Description], "-"))
elseif Contains([Line Items Description]," : ")
then Left([Line Items Description], FindString([Line Items Description], " : "))
elseif Contains([Line Items Description],"RESPECT")
then Left([Line Items Description], FindString([Line Items Description], "RESPECT"))
elseif Contains([Line Items Description],"BEHALF")
then Left([Line Items Description], FindString([Line Items Description], "BEHALF"))
else [Line Items Description]
endif),"(.*)\-(.*)", "$1"))
Sample below:
FEE FOR PROFESSIONAL SERVICES RENDERED IN RESPECT OF AN APPLICATION FOR SHORT STAY WORK VISA . FEE APPROVED BY xxxxx ON -OCT-. AUD$. +GST CONVERTED TO USD - OUT OF SCOPE |
GOVERNMENT FEES PAID ON YOUR BEHALF. AUD . |
FOR PROFESSIONAL SERVICES RENDERED WITH RESPECT TO THE COORDINATION OF IMMIGRATION REQUIREMENTS - WORK PERMIT CANCELLATION - USD |
FOR PROFESSIONAL SERVICES RENDERED WITH RESPECT TO THE ASSESSMENT FOR WORK PERMIT USD, |
GOVERNMENT FEES PAID ON YOUR BEHALF - RESIDENCE PERMIT . CNY |
Solved! Go to Solution.
Hi @BosKev
Try this:
REGEX_Replace([Field1], "(^.*?)([\.:]|respect|behalf).*", "$1$2")
It removes everything after the first occurence of any of these ".", ":" "respect", "behalf"
Cheers,
Hi could you please show what you expect for each line?
Hi Thableaus,
This is works, very helpful in shorting my original formula.
Is it possible to explain how this works? I want to be able to update the conditions if anything changes.
for example if new words pops up or special character.
-Boskev
Glad it worked, @BosKev
Sure, I can explain shortly to you:
REGEX_Replace([Field1], "(^.*?)([\.:]|respect|behalf).*", "$1$2")
The expression we need to replace:
(^.*?) - ^ beginning of the sentence, .*? any character 0 or more times until it finds the next character in our expression (that's why we put the "?" character). We put between parenthesis for further reference in the replacement expression.
[\.:] - this means a dot "." or a semicolon ":". The dot is escaped (\) because it's a metacharacter.
| - this is an alternation character
respect and behalf are the exact words you're looking for
([\.:]|respect|behalf) - this whole expression means you can find either a dot, a semicolon, "respect" or "behalf". And it's between parenthesis because we need to reference that in our replacement expression.
.* - any character 0 or more times. This basically means anything after what we specified in our previous expression.
The replacement expression:
It's basically the first expression in parenthesis ($1) followed by the second ($2). So we get rid of everything we didn't reference and put between parenthesis.
If something is not clear, let me know.
Cheers,
Great!
just one follow up regarding comment below:
"The replacement expression:
... So we get rid of everything we didn't reference and put between parenthesis."
do you mean we keep everything not referenced because the goal is to remove any text string after :,-, or respect/behalf?
Basically we have this:
(^.*?)([\.:]|respect|behalf).*
(^.*?) - This means $1 - expression 1
([\.:]|respect|behalf) - This means $2 - expression 2
.* - not referenced, not between parenthesis.
We're replacing (^.*?)([\.:]|respect|behalf).* - the whole expression
for
$1$2 - which is basically (^.*?)([\.:]|respect|behalf)
Cheers,
Thank you!