Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Regex Tool

Christeen
6 - Meteoroid

I need to match all the following Text. However, the following expression does not match all the texts. Could someone help me with this pls

 

Expression : (\d+)\s([A-Z]+)\s(\w+)\s*(\w*)

 

2316 E 5TH AVE
2753 S MILWAUKEE ST
16911 E HARVARD AVE
3721 S PITKIN CT
7762 SHOSHONE ST
18796 E BALTIC PL
5375 SOMBRERO
2100 16TH ST

7 REPLIES 7
DawnDuong
13 - Pulsar
13 - Pulsar

Hi @Christeen 

You mention that you want to “match” this. Doe you mean to use the Regex under Match setting to pick up this?

the current Regex seems like you are trying to parse instead... 

anyway if you only need to make sure these are identified as “matched”, one way is

\d+\s.*
dawn

Qiu
21 - Polaris
21 - Polaris

@Christeen 
We need to see the whole string then we can come up a Regex to have match the below while not matching with the rest.

Christeen
6 - Meteoroid

Hi Dawn,

Actually, I need to Parse under the Regex tool.

DawnDuong
13 - Pulsar
13 - Pulsar

Hi @Christeen 

Can you explain what is the required Parse output from the inputs?

depending on what is required, the regex is different 

dawn 

 

Christeen
6 - Meteoroid

Hi Dawn,

I need to separate the text(entire address) into 4 different columns {1- House No 2-the Single Letter 3-Street name & 4-Street Type (Ave, Park etc}

apathetichell
19 - Altair

Hi - your scenario assumes that there is a value for a single letter. This often is not the case in your data can create an error when it does occur:

(\d+)\s([A-Z]+)\s(\w+)\s*(\w*)

matches exactly:

1 any number of numbers number

2) 1 space

3) any numbers of letters - note this is for the single letter like "E"'"N" etc...

4) a space

5) word letters(any amount)

and then maybe a space and maybe a word...

 

 

Tested it and (\d+)\s(\w+)\s*(\w*)\s*(\w*) is the best the data you posted. I'd expect you'll have some issues somewhere if you have enough addresses because addresses aren't really regular expressions.

 

Your next step is seeing scenarios where your second output isn't just a single character length([regex_out2])=1 or something and then pushing down the values where that's true...

DawnDuong
13 - Pulsar
13 - Pulsar

Hi @Christeen 

@apathetichell has a nice solution for the standard case. Just from your sample data alone, there are lines that do not always have all 4 components. Even for the human brain... we have a guess using our own experience that probably which component is missing.

to get a complete solution, you probably need to profile your data first. Eg which record has 4 components (eg by counting the space)? If don’t have 4 components which rules apply and so on.

Have your decision tree mapped our first and modify the baseline case accordingly 

good luck!

dawn 

Labels