Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Alteryx Regex not matching test cases

jdoyle
5 - Atom

I'm trying to flag addresses in a large pile of data that can be considered "PO box" addresses before trying to do some location work on them. I wrote up a regex pattern which should find several forms of how it's written in my data, and when I test it externally (e.g., in regexr), the matches all work. However, when using Alteryx, none of the cases which should match are flagged as matching, and I'm at a complete loss as to why. 

 

The pattern I'm using is (with case insensitivity):

 

(^| )P(\.| )?O(\.)? BOX

 

to find addresses that start with PO BOX, or to find them after a street/name, etc. Sometimes there's periods, sometimes the P and O are separated by spaces, etc., hence the optional groups. 

 

As we can see in the test cases, it finds all the patterns correctly, and doesn't false positive on things like LARPO BOX Company, and handles the weird spacing just fine.

image.png

but the Alteryx parser fails to match any of them

image.png

5 REPLIES 5
Raj
16 - Nebula

@jdoyle 
one workaround
hope this helps

apathetichell
19 - Altair

| is used for entire terms - so:

^P\U{0,1}O\U{0,1}\s*BOX.*|.*\UP\U{0,1}O\U{0,1}\s*BOX.* works

DataNath
17 - Castor

Hey @jdoyle, the reason your workflow isn't working at the moment is because the RegEx matching doesn't check for the target string containing your expression, but evaluates whether or not the entire target string matches your expression. At the minute, you're asking Alteryx:

 

Is my string 'PO BOX'? (With the variations i.e. spaces/periods), and just that alone - there's no considerations for there being things either side of this such as names, streets and so on as you have in your field. Therefore, adding .* to each side will fix this as you're now effectively saying: Is my string <zero or more of any characters>'PO BOX' (with the variations)<zero or more of any characters> - which they are. See here, all of the matches return as expected:

 

bdbd.png

 

.*(^| )P(\.| )?O(\.)? BOX.*

 

OllieClarke
15 - Aurora
15 - Aurora

Hey @jdoyle 

REGEX_MATCH() or the match function in the RegEx tool is always testing against the full field, whereas in regexr its more of a contains test than a perfect match.

What that means for your case is you need to include the stuff before and after PO Box in your RegEx, like:

(^|.*? )P(\.| )?O\.? BOX.*

 

Hope that helps,

 

Ollie

jdoyle
5 - Atom

Perfect. Thanks everyone. I'm so glad it was something simple/silly and straightforward like this. Works great now with the "and also anything before/after" bits.

Labels