I have below calculation which was working until I created a new formula before regex function,
if REGEX_Match([Payee Name],".LLC.||.CORP.||.DEPT.||.TRUSTEE.||.ESTATE.||.CLERK.||.INC.||.DEPARTMENT.||.CONSULTING.||.DBA.||.TREASURY.||.ASSOCIATES.||.AGENCY.||.SERVICE.||.TREASURY.||.LAW.||.REALTY.||.GROUP.||.COLLECTION.")
THEN 'ENT'
elseif REGEX_Match([Payee addr 1],".LLC.||.CORP.||.DEPT.||.TRUSTEE.||.ESTATE.||.CLERK.||.INC.||.DEPARTMENT.||.CONSULTING.||.DBA.||.TREASURY.||.ASSOCIATES.||.AGENCY.||.SERVICE.||.TREASURY.||.LAW.||.REALTY.||.GROUP.||.COLLECTION.")
then 'ENT'
elseif [Payee addr 1] != '' and REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 then 'MULT'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee Name],".SR.")) then 'MULTI SR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee addr 1],".SR.")) then 'MULTI SR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee Name],".*III.*")) then 'MULTI III'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee addr 1],".*III.*")) then 'MULTI III'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 and REGEX_Match([Payee Name],".JR.")) then 'IND JR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 and REGEX_Match([Payee Name],".SR.")) then 'IND SR'
ELSE 'IND'
ENDIF
The requirement was to identify certain type of keywords and then categorize them based on the results. Now before calling out that Regex function I created a formula which is :
if [Payee addr 1] = 'UNKNOWN' then null()
else [Payee addr 1]
endif
The result of the above is that where ever the unknown is being nulled the Regex calc is giving me unexpected result of ENT instead of IND. Results screenshot attached
Thanks in advance for your help
Solved! Go to Solution.
Yes. with no wildcards/regex code it is looking for an exact match. To find GAP INC you could use REGEX_Match([Field1],".*\sINC.*") which won't match princeton because it requires a space prior to INC. It will however match INCONTINENT because it will match anything after. The later match is because you could have a period, a comma, a space, or an end of line following INC
I'm using |\sINC/s| for the latter scenario of INCONTINENT
that matches space INC space - exactly . You will not match "HELLO INC" OR "TESLA INC" or anything other. you need ".*\sINC" to match "TESLA INC" but I can't guarantee it will match "'TESLA INC." which is why I have "\sINC.*" - which will match "TESLA INC.","TESLA INC ","TESLA INC!" but will also match " INCAPACITATED" it's a trade off. You have to figure out are you more worried about false positives or false negatives - and only you know that.
got it 🙂 appreciate your help so far.. Thank you! should it be |\sINC/s| or |\sINCs/| ? What is the correct syntax?
I'd do .*\sInc.* personally - but if you only want to match INC space on BOTH sides it's the former. later is matching "INCS|" along with whatever comes after it in your if statement.
@apathetichell Hope that you are doing well. Can you please help me understand the role of .* infront and the end of each keyword? Also how do I specify space infront and after the keywords ...(Court[SPACE] Wildcard)
if REGEX_Match([Field_name],".*\sLLC.*|.*CORP.*|.*DEPT.*|.*TRUSTEE.*|.*ESTATE.*|.*CLERK.*|.*\sINC|.*DEPARTMENT.*|.*CONSULTING.*|.*DBA.*|.*TREASURY.*|.*ASSOCIATES.*|.*AGENCY.*|.*SERVICE.*|.*TREASURY.*|.*LAW.*|.*REALTY.*|.*GROUP.*|.*COLLECTION.*|.*SOCIETY.*|.*STATE.*|.*DEPARTMENT.*|.*ASSOCIATION.*|.*COUNTY.*|.*CONSULATE.*|.*INSURANCE.*|.*DISTRICT.*|.*COUNCIL.*|.*PROGRAM.*|.*CLEANING.*|.*LTD.*|.*PROGR.*|.*TREASURER.*|.*RESORT.*|.*APARTMENTS.*|.*MANAGEMENT.*|.*CONDOMINIUM.*|.*NURSING.*|.*\sAPTS.*|IRS|ATT|VISA|WALGREENS|Under|Court.*|Under Court|Circle.*")
Hi,
so for regex the key is to actually match something. If you don't know what is showing up and where - you need to say that your expression starts with something (and ends with something) and in the middle is what you want. The "something" in this is case is any character any number of times ie .(any character) (* zero or more times).... hope that helps.
Did that regex_match for you?
