Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex_match for identifying multiple keywords in 2 columns

Idyllic_Data_Geek
8 - Asteroid

I have below calculation which was working until I created a new formula before regex function,

 

if REGEX_Match([Payee Name],".LLC.||.CORP.||.DEPT.||.TRUSTEE.||.ESTATE.||.CLERK.||.INC.||.DEPARTMENT.||.CONSULTING.||.DBA.||.TREASURY.||.ASSOCIATES.||.AGENCY.||.SERVICE.||.TREASURY.||.LAW.||.REALTY.||.GROUP.||.COLLECTION.")
THEN 'ENT'
elseif REGEX_Match([Payee addr 1],".LLC.||.CORP.||.DEPT.||.TRUSTEE.||.ESTATE.||.CLERK.||.INC.||.DEPARTMENT.||.CONSULTING.||.DBA.||.TREASURY.||.ASSOCIATES.||.AGENCY.||.SERVICE.||.TREASURY.||.LAW.||.REALTY.||.GROUP.||.COLLECTION.")
then 'ENT'
elseif [Payee addr 1] != '' and REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 then 'MULT'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee Name],".SR.")) then 'MULTI SR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee addr 1],".SR.")) then 'MULTI SR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee Name],".*III.*")) then 'MULTI III'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee addr 1],".*III.*")) then 'MULTI III'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 and REGEX_Match([Payee Name],".JR.")) then 'IND JR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 and REGEX_Match([Payee Name],".SR.")) then 'IND SR'
ELSE 'IND'
ENDIF

The requirement was to identify certain type of keywords and then categorize them based on the results. Now before calling out that Regex function I created a formula which is :

if [Payee addr 1] = 'UNKNOWN' then null()
else [Payee addr 1]
endif

The result of the above is that where ever the unknown is being nulled the Regex calc is giving me unexpected result of ENT instead of IND. Results screenshot attached

 

 

Thanks in advance for your help

26 REPLIES 26
apathetichell
20 - Arcturus

Yes. with no wildcards/regex code it is looking for an exact match. To find GAP  INC you could use REGEX_Match([Field1],".*\sINC.*") which won't match princeton because it requires a space prior to INC. It will however match INCONTINENT because it will match anything after. The later match is because you could have a period, a comma, a space, or an end of line following INC

Idyllic_Data_Geek
8 - Asteroid

I'm using |\sINC/s| for the latter scenario of INCONTINENT

apathetichell
20 - Arcturus

that matches space INC space - exactly . You will not match "HELLO INC" OR "TESLA INC" or anything other. you need ".*\sINC" to match "TESLA INC" but I can't guarantee it will match "'TESLA INC." which is why I have "\sINC.*" - which will match "TESLA INC.","TESLA INC ","TESLA INC!" but will also match " INCAPACITATED" it's a trade off. You have to figure out are you more worried about false positives or false negatives - and only you know that. 

Idyllic_Data_Geek
8 - Asteroid

got it 🙂 appreciate your help so far.. Thank you! should it be |\sINC/s| or |\sINCs/|   ? What is the correct syntax?

apathetichell
20 - Arcturus

I'd do .*\sInc.* personally - but if you only want to match INC space on BOTH sides it's the former. later is matching "INCS|" along with whatever comes after it in your if statement.

Idyllic_Data_Geek
8 - Asteroid

@apathetichell Hope that you are doing well. Can you please help me understand the role of .* infront and the end of each keyword? Also how do I specify space infront and after the keywords ...(Court[SPACE] Wildcard)

 

if REGEX_Match([Field_name],".*\sLLC.*|.*CORP.*|.*DEPT.*|.*TRUSTEE.*|.*ESTATE.*|.*CLERK.*|.*\sINC|.*DEPARTMENT.*|.*CONSULTING.*|.*DBA.*|.*TREASURY.*|.*ASSOCIATES.*|.*AGENCY.*|.*SERVICE.*|.*TREASURY.*|.*LAW.*|.*REALTY.*|.*GROUP.*|.*COLLECTION.*|.*SOCIETY.*|.*STATE.*|.*DEPARTMENT.*|.*ASSOCIATION.*|.*COUNTY.*|.*CONSULATE.*|.*INSURANCE.*|.*DISTRICT.*|.*COUNCIL.*|.*PROGRAM.*|.*CLEANING.*|.*LTD.*|.*PROGR.*|.*TREASURER.*|.*RESORT.*|.*APARTMENTS.*|.*MANAGEMENT.*|.*CONDOMINIUM.*|.*NURSING.*|.*\sAPTS.*|IRS|ATT|VISA|WALGREENS|Under|Court.*|Under Court|Circle.*")

apathetichell
20 - Arcturus

Hi,

 

so for regex the key is to actually match something. If you don't know what is showing up and where - you need to say that your expression starts with something (and ends with something) and in the middle is what you want. The "something" in this is case is any character any number of times ie .(any character) (* zero or more times).... hope that helps.

 

Did that regex_match for you?

 

 

Labels
Top Solution Authors