Alteryx Designer Desktop Discussions

Christeen · ‎06-25-2021

I need to match all the following Text. However, the following expression does not match all the texts. Could someone help me with this pls

Expression : (\d+)\s([A-Z]+)\s(\w+)\s*(\w*)

2316 E 5TH AVE
2753 S MILWAUKEE ST
16911 E HARVARD AVE
3721 S PITKIN CT
7762 SHOSHONE ST
18796 E BALTIC PL
5375 SOMBRERO
2100 16TH ST

DawnDuong · ‎06-25-2021

Hi @Christeen

You mention that you want to “match” this. Doe you mean to use the Regex under Match setting to pick up this?

the current Regex seems like you are trying to parse instead...

anyway if you only need to make sure these are identified as “matched”, one way is

\d+\s.*
dawn

Qiu · ‎06-25-2021

@Christeen
We need to see the whole string then we can come up a Regex to have match the below while not matching with the rest.

Christeen · ‎06-25-2021

Hi Dawn,

Actually, I need to Parse under the Regex tool.

DawnDuong · ‎06-26-2021

Hi @Christeen

Can you explain what is the required Parse output from the inputs?

depending on what is required, the regex is different

dawn

Christeen · ‎06-27-2021

Hi Dawn,

I need to separate the text(entire address) into 4 different columns {1- House No 2-the Single Letter 3-Street name & 4-Street Type (Ave, Park etc}

apathetichell · ‎06-27-2021

Hi - your scenario assumes that there is a value for a single letter. This often is not the case in your data can create an error when it does occur:

(\d+)\s([A-Z]+)\s(\w+)\s*(\w*)

matches exactly:

1 any number of numbers number

2) 1 space

3) any numbers of letters - note this is for the single letter like "E"'"N" etc...

4) a space

5) word letters(any amount)

and then maybe a space and maybe a word...

Tested it and (\d+)\s(\w+)\s*(\w*)\s*(\w*) is the best the data you posted. I'd expect you'll have some issues somewhere if you have enough addresses because addresses aren't really regular expressions.

Your next step is seeing scenarios where your second output isn't just a single character length([regex_out2])=1 or something and then pushing down the values where that's true...

DawnDuong · ‎06-28-2021

Hi @Christeen

@apathetichell has a nice solution for the standard case. Just from your sample data alone, there are lines that do not always have all 4 components. Even for the human brain... we have a guess using our own experience that probably which component is missing.

to get a complete solution, you probably need to profile your data first. Eg which record has 4 components (eg by counting the space)? If don’t have 4 components which rules apply and so on.

Have your decision tree mapped our first and modify the baseline case accordingly

good luck!

dawn

Alteryx Designer Desktop Discussions

Regex Tool