I have below calculation which was working until I created a new formula before regex function,
if REGEX_Match([Payee Name],".LLC.||.CORP.||.DEPT.||.TRUSTEE.||.ESTATE.||.CLERK.||.INC.||.DEPARTMENT.||.CONSULTING.||.DBA.||.TREASURY.||.ASSOCIATES.||.AGENCY.||.SERVICE.||.TREASURY.||.LAW.||.REALTY.||.GROUP.||.COLLECTION.")
THEN 'ENT'
elseif REGEX_Match([Payee addr 1],".LLC.||.CORP.||.DEPT.||.TRUSTEE.||.ESTATE.||.CLERK.||.INC.||.DEPARTMENT.||.CONSULTING.||.DBA.||.TREASURY.||.ASSOCIATES.||.AGENCY.||.SERVICE.||.TREASURY.||.LAW.||.REALTY.||.GROUP.||.COLLECTION.")
then 'ENT'
elseif [Payee addr 1] != '' and REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 then 'MULT'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee Name],".SR.")) then 'MULTI SR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee addr 1],".SR.")) then 'MULTI SR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee Name],".*III.*")) then 'MULTI III'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee addr 1],".*III.*")) then 'MULTI III'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 and REGEX_Match([Payee Name],".JR.")) then 'IND JR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 and REGEX_Match([Payee Name],".SR.")) then 'IND SR'
ELSE 'IND'
ENDIF
The requirement was to identify certain type of keywords and then categorize them based on the results. Now before calling out that Regex function I created a formula which is :
if [Payee addr 1] = 'UNKNOWN' then null()
else [Payee addr 1]
endif
The result of the above is that where ever the unknown is being nulled the Regex calc is giving me unexpected result of ENT instead of IND. Results screenshot attached
Thanks in advance for your help
Solved! Go to Solution.
Match is in the payee addr not the name - I transcribed and edited your relevant if statement (got rid of double pipes) and ran this:
So it's not Payee Name which is triggering the false match - at least if you clean up the code.
Removing the pipe does not work for me!
Sorry should have been clearer - I removed the pipes and redid the first clause of the if statement that gave me: if REGEX_Match([FIELD1],".*LLC.*|.*CORP.*|.*DEPT.*|.*TRUSTEE.*|.*ESTATE.*|.*CLERK.*|.*INC.*|.*DEPARTMENT.*|.*CONSULTING.*|.*DBA.*|.*TREASURY.*|.*ASSOCIATES.*|.*AGENCY.*|.*SERVICE.*|.*TREASURY.*|.*LAW.*|.*REALTY.*|.*GROUP.*|.*COLLECTION.*|.*SOCIETY.*|.*STATE.*|.*DEPARTMENT.*|.*ASSOCIATION.*|.*COUNTY.*|.*CONSULATE.*|.*INSURANCE.*|.*DISTRICT.*|.*COUNCIL.*|.*PROGRAM.*|.*CLEANING.*|.*LTD.*|.*PROGR.*|.*TREASURER.*|.*RESORT.*|.*APARTMENTS.*|.*MANAGEMENT.*|.*CONDOMINIUM.*")
THEN 'ENT' ELSE "OTHER" ENDIF - which gave me the results pictured.
Understood. But the payee addr 1 is using the same logic and looking for the same keywords. The payee addr1 is not having any of those keywords so I'm not sure why it is flagging as ENT. I can not post the data here...but I hope that you are understanding what I'm stating!
Can you post the payee addr 1 for the entry you posted? if not is it possible that these letters are found in the address?
LAW
CORP
RESORT
STATE
INC
like if the address is 170 STATE ST. that would flag, or 220 LAWTON would flag.
If you can't cut and paste - I'd recommend troubleshooting on your own by cutting and pasting parts to figure out where the error is occurring but as noted - you are using regex as kind of a natural language editor and it's not quite that. I'd say the approach you are using will never be perfect especially on larger data sets so you just kind of have to figure out how you are handling and dealing with the errors.
PRINCETON AVE
VINCENNES RD
AVE
so instead of using the wildcard all over.. how can I define "(space)LLC(Space)"
Yeah if you modify your code from ".*" to "\s" in all instances you should only find instances where space \keyword\ space are met. so "\sINC\s" should exclude Princeton. Again this won't help with STATE STREET.
got it. thank you. If I do just the |INC|, then is it looking for the values of [Field] = INC and if it has GAP INC then it will reject it?
