Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex_match for identifying multiple keywords in 2 columns

Idyllic_Data_Geek
8 - Asteroid

I have below calculation which was working until I created a new formula before regex function,

 

if REGEX_Match([Payee Name],".LLC.||.CORP.||.DEPT.||.TRUSTEE.||.ESTATE.||.CLERK.||.INC.||.DEPARTMENT.||.CONSULTING.||.DBA.||.TREASURY.||.ASSOCIATES.||.AGENCY.||.SERVICE.||.TREASURY.||.LAW.||.REALTY.||.GROUP.||.COLLECTION.")
THEN 'ENT'
elseif REGEX_Match([Payee addr 1],".LLC.||.CORP.||.DEPT.||.TRUSTEE.||.ESTATE.||.CLERK.||.INC.||.DEPARTMENT.||.CONSULTING.||.DBA.||.TREASURY.||.ASSOCIATES.||.AGENCY.||.SERVICE.||.TREASURY.||.LAW.||.REALTY.||.GROUP.||.COLLECTION.")
then 'ENT'
elseif [Payee addr 1] != '' and REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 then 'MULT'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee Name],".SR.")) then 'MULTI SR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee addr 1],".SR.")) then 'MULTI SR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee Name],".*III.*")) then 'MULTI III'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee addr 1],".*III.*")) then 'MULTI III'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 and REGEX_Match([Payee Name],".JR.")) then 'IND JR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 and REGEX_Match([Payee Name],".SR.")) then 'IND SR'
ELSE 'IND'
ENDIF

The requirement was to identify certain type of keywords and then categorize them based on the results. Now before calling out that Regex function I created a formula which is :

if [Payee addr 1] = 'UNKNOWN' then null()
else [Payee addr 1]
endif

The result of the above is that where ever the unknown is being nulled the Regex calc is giving me unexpected result of ENT instead of IND. Results screenshot attached

 

 

Thanks in advance for your help

26 REPLIES 26
apathetichell
18 - Pollux

can you post sample data? switch statements would definitely make things clearer.

DawnDuong
13 - Pulsar
13 - Pulsar

Hi @Idyllic_Data_Geek 

If everything else is working fine except the “else” then  i suggest you explicitly call out the null test 

If isnull(<fieldname>) then “IND”

elseif ... <the rest if your test conditions>
then “errors”

Endif

 

without knowing some sample data, that’s the best i can think of 

Dawn 

apathetichell
18 - Pollux

Double pipes are your problem. Get rid of the double pipes and turn them into single pipes and it works. The single pipe is your regex "or" operator - so having a double pipe means you are accepting null() in your set of things to match for ie || effectively means OR null(). See screenshots with single pipe and one set of double pipes.

2021-06-23 (1).png

2021-06-23.png

  

Idyllic_Data_Geek
8 - Asteroid

@apathetichell thank you .. it did the trick

Idyllic_Data_Geek
8 - Asteroid

Quesytion -- do you know what the * is doing in the below regex calc?

 

if REGEX_Match([Payee Name],".*LLC.*||.*CORP.*||.*DEPT.*||.*TRUSTEE.*||.*ESTATE.*||.*CLERK.*||.*INC.*||.*DEPARTMENT.*||.*CONSULTING.*||.*DBA.*||.*TREASURY.*||.*ASSOCIATES.*||.*AGENCY.*||.*SERVICE.*||.*TREASURY.*||.*LAW.*||.*REALTY.*||.*GROUP.*||.*COLLECTION.*||.*SOCIETY.*||.*STATE.*||.*DEPARTMENT.*||.*ASSOCIATION.*||.*COUNTY.*||.*CONSULATE.*||.*INSURANCE.*||.*DISTRICT.*||.*COUNCIL.*||.*PROGRAM.*||.*CLEANING.*||.*LTD.*||.*PROGR.*||.*TREASURER.*")

 

 

The Department is being flagged correctly but Departments is not.what does the period and asterisk accomplish in .*DEPARTMENT.*?

 

Thanks in Advance!

apathetichell
18 - Pollux

".*" means anything any number of times so it's basically saying wildcard LLC wildcard. Each entry repeats with a wildcard before and after. note - the wildcard doesn't need to be fulfilled - it's just there as a check.

 

I'd have to see the full data on the Departments entry and the match to explain but keep in mind that this might not be the ideal regex use - by definition PARTS (or something) could match DEPARTMENTS prior to DEPARTMENT...

Idyllic_Data_Geek
8 - Asteroid

then with .*DEPARTMENT.*...the code should recognize Departments as well?

apathetichell
18 - Pollux

It SHOULD. Issues are is it case insensitive and is departments matching anything else beforehand in the regex statement.

department True
departments True

 

is what I got from my test.

Idyllic_Data_Geek
8 - Asteroid

It's not case sensitive. Can you please test it with the Payee Name: WILHELMINA MARTIN. why is being flagged as 'ENT' and not 'IND'? Below is the full code that I have,

if REGEX_Match([Payee Name],".*LLC.*||.*CORP.*||.*DEPT.*||.*TRUSTEE.*||.*ESTATE.*||.*CLERK.*||.*INC.*||.*DEPARTMENT.*||.*CONSULTING.*||.*DBA.*||.*TREASURY.*||.*ASSOCIATES.*||.*AGENCY.*||.*SERVICE.*||.*TREASURY.*||.*LAW.*||.*REALTY.*||.*GROUP.*||.*COLLECTION.*||.*SOCIETY.*||.*STATE.*||.*DEPARTMENT.*||.*ASSOCIATION.*||.*COUNTY.*||.*CONSULATE.*||.*INSURANCE.*||.*DISTRICT.*||.*COUNCIL.*||.*PROGRAM.*||.*CLEANING.*||.*LTD.*||.*PROGR.*||.*TREASURER.*||.*RESORT.*||.*APARTMENTS.*||.*MANAGEMENT.*||.*CONDOMINIUM.*") and !ISNULL([Payee Name])
THEN 'ENT'
elseif REGEX_Match([Payee addr 1],".*LLC.*||.*CORP.*||.*DEPT.*||.*TRUSTEE.*||.*ESTATE.*||.*CLERK.*||.*INC.*||.*DEPARTMENT.*||.*CONSULTING.*||.*DBA.*||.*TREASURY.*||.*ASSOCIATES.*||.*AGENCY.*||.*SERVICE.*||.*TREASURY.*||.*LAW.*||.*REALTY.*||.*GROUP.*||.*COLLECTION.*||.*SOCIETY.*||.*STATE.*||.*DEPARTMENT.*||.*ASSOCIATION.*||.*COUNTY.*||.*CONSULATE.*||.*INSURANCE.*||.*DISTRICT.*||.*COUNCIL.*||.*PROGRAM.*||.*CLEANING.*||.*LTD.*||.*PROGR.*||.*TREASURER.*||.*APARTMENTS.*||.*RESORT.*||.*MANAGEMENT.*||.*CONDOMINIUM.*") and !ISNULL([Payee addr 1])
then 'ENT'
elseif !ISNULL([Payee addr 1]) and REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 then 'MULT'
elseif((REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 or ISNULL([Payee addr 1]))and REGEX_Match([Payee Name],".SR.")) then 'MULTI SR'
elseif((REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 or ISNULL([Payee addr 1])) and REGEX_Match([Payee addr 1],".SR.")) then 'MULTI SR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee Name],".*III.*")) then 'MULTI III'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') <= 0 and REGEX_Match([Payee addr 1],".*III.*")) then 'MULTI III'
elseif((REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 or ISNULL([Payee addr 1]))and (REGEX_Match([Payee Name],".*JR.*"))) then 'IND JR'
elseif( REGEX_CountMatches([Payee addr 1], '[0-9]') > 0 and REGEX_Match([Payee Name],".SR.")) then 'IND SR'
ELSE 'IND'
ENDIF

 

 

Labels