Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex_match for identifying multiple keywords in 2 columns

Idyllic_Data_Geek
8 - Asteroid

I have below calculation which was working until I created a new formula before regex function,

 

if REGEX_Match([Payee Name],".LLC.||.CORP.||.DEPT.||.TRUSTEE.||.ESTATE.||.CLERK.||.INC.||.DEPARTMENT.||.CONSULTING.||.DBA.||.TREASURY.||.ASSOCIATES.||.AGENCY.||.SERVICE.||.TREASURY.||.LAW.||.REALTY.||.GROUP.||.COLLECTION.")
THEN 'ENT'
elseif REGEX_Match([Payee addr 1],".LLC.||.CORP.||.DEPT.||.TRUSTEE.||.ESTATE.||.CLERK.||.INC.||.DEPARTMENT.||.CONSULTING.||.DBA.||.TREASURY.||.ASSOCIATES.||.AGENCY.||.SERVICE.||.TREASURY.||.LAW.||.REALTY.||.GROUP.||.COLLECTION.")
then 'ENT'
elseif [Payee addr 1] != '' and REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 then 'MULT'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee Name],".SR.")) then 'MULTI SR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee addr 1],".SR.")) then 'MULTI SR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee Name],".*III.*")) then 'MULTI III'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee addr 1],".*III.*")) then 'MULTI III'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 and REGEX_Match([Payee Name],".JR.")) then 'IND JR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 and REGEX_Match([Payee Name],".SR.")) then 'IND SR'
ELSE 'IND'
ENDIF

The requirement was to identify certain type of keywords and then categorize them based on the results. Now before calling out that Regex function I created a formula which is :

if [Payee addr 1] = 'UNKNOWN' then null()
else [Payee addr 1]
endif

The result of the above is that where ever the unknown is being nulled the Regex calc is giving me unexpected result of ENT instead of IND. Results screenshot attached

 

 

Thanks in advance for your help

26 REPLIES 26
apathetichell
20 - Arcturus

Match is in the payee addr not the name - I transcribed and edited your relevant if statement (got rid of double pipes) and ran this:

2021-06-24 (1).png

 So it's not Payee Name which is triggering the false match - at least if you clean up the code.

Idyllic_Data_Geek
8 - Asteroid

Removing the pipe does not work for me!

apathetichell
20 - Arcturus

Sorry should have been clearer  - I removed the pipes and redid the first clause of the if statement that gave me: if REGEX_Match([FIELD1],".*LLC.*|.*CORP.*|.*DEPT.*|.*TRUSTEE.*|.*ESTATE.*|.*CLERK.*|.*INC.*|.*DEPARTMENT.*|.*CONSULTING.*|.*DBA.*|.*TREASURY.*|.*ASSOCIATES.*|.*AGENCY.*|.*SERVICE.*|.*TREASURY.*|.*LAW.*|.*REALTY.*|.*GROUP.*|.*COLLECTION.*|.*SOCIETY.*|.*STATE.*|.*DEPARTMENT.*|.*ASSOCIATION.*|.*COUNTY.*|.*CONSULATE.*|.*INSURANCE.*|.*DISTRICT.*|.*COUNCIL.*|.*PROGRAM.*|.*CLEANING.*|.*LTD.*|.*PROGR.*|.*TREASURER.*|.*RESORT.*|.*APARTMENTS.*|.*MANAGEMENT.*|.*CONDOMINIUM.*")
THEN 'ENT' ELSE "OTHER" ENDIF - which gave me the results pictured.

 

Idyllic_Data_Geek
8 - Asteroid

Understood. But the payee addr 1 is using the same logic and looking for the same keywords. The payee addr1 is not having any of those keywords so I'm not sure why it is flagging as ENT. I can not post the data here...but I hope that you are understanding what I'm stating!

apathetichell
20 - Arcturus

Can you post the payee addr 1 for the entry you posted? if not is it possible that these letters are found in the address?

LAW

CORP

RESORT

STATE

INC

 

like if the address is 170 STATE ST. that would flag, or 220 LAWTON would flag.

apathetichell
20 - Arcturus

If you can't cut and paste - I'd recommend troubleshooting on your own by cutting and pasting parts to figure out where the error is occurring but as noted - you are using regex as kind of a natural language editor and it's not quite that.  I'd say the approach you are using will never be perfect especially on larger data sets so you just kind of have to figure out how you are handling and dealing with the errors.

Idyllic_Data_Geek
8 - Asteroid

PRINCETON AVE

VINCENNES RD

AVE

Idyllic_Data_Geek
8 - Asteroid

so instead of using the wildcard all over.. how can I define "(space)LLC(Space)"

apathetichell
20 - Arcturus

Yeah if you modify your code from ".*" to "\s" in all instances you should only find instances where space \keyword\ space are met. so "\sINC\s" should exclude Princeton. Again this won't help with STATE STREET.

Idyllic_Data_Geek
8 - Asteroid

got it. thank you. If I do just the |INC|, then is it looking for the values of [Field] = INC and if it has GAP INC then it will reject it?

Labels
Top Solution Authors