Alteryx Designer Desktop Discussions

Ultralightbeam · ‎01-27-2021

GY12_Word_31Dec20_Word2_d1.10_G12_1234

I have 7 to 9 delimited(_) string and I want to determine if each delimit is in proper format/appropriate value

I am using regex tokenize but for some reason it doesn't seem to work

Current Value	How to check if it's correct
GY20	First two letters should be GY followed by two digit number 18 to current last two digit of year
Word	Any word must not contain special characters excluding period and underscore.
31Dec20	\d{2}\w+\d{2,4}
Word2	Contains word 'Licensing' or any 3 character length
d1.10	left must be lowercase of v and second number must be a period.
G12	G\d{1,2}
1234	\d{3,4}

is it possible for this to be written in one regex expression or should do split to columns then do each regex expression?

Tyro_abc · ‎01-27-2021

Spoiler

attached workflow

Regards
Arundhuti

Ultralightbeam · ‎01-27-2021

@Tyro_abc I need to get the correct pattern for each since the pattern for each row is different.

Qiu · ‎01-27-2021

@Ultralightbeam

It seems you should use RegMatch instead.

However, you said there are 7 to 9 delimited(_) , which make thing complicated.

Do we have fixed pattern for the 7, 8, 9 cases seperately?

Ultralightbeam · ‎01-27-2021

@Qiu

basically instances where there are 7 to 9 delimited is based from the word1 sometimes word1 got two delimited which can be concatenated into one.

original and standard format GY12_Word_31Dec20_Word2_d1.10_G12_1234 - 7 delimiter

sometimes

Like GY12_Word_Word.1_Word.2_31Dec20_Word2_d1.10_G12_1234

Therefor Word_Word.1_word.2 must be in one pattern Any word must not contain special characters excluding period and underscore. (there is really no pattern for this) thinking of just doing (\w+)

Ultralightbeam · ‎01-27-2021

@Qiu i actually got it by (\w+) my next problem is Word2 which should be equal to = Licensing or a three character of length both should be accepted.

Qiu · ‎01-27-2021

@Ultralightbeam

like this?

Licensing|\w{3}

Ultralightbeam · ‎01-27-2021

The word should contain "Licensing" or is in 3 character of length (LCA)

both should be captured.

I have a word Licensing

and some instance there is a word LCA

Qiu · ‎01-27-2021

@Ultralightbeam

So the one I gave suit your requirement.

Licensing or \w{3}.

"|" means either

Tyro_abc · ‎01-27-2021

Another try, might need some more fine-tuning but working with my sample data.

(GY[1-2][\d])_(\w+)_(\d{2}[a-zA-Z]{3}\d{2,4})_(Licensing|[A-Z]{3})_(\l\d\.\d{2})_(G\d{1,2})_(\d{3,4})

Regards

Arundhuti

Alteryx Designer Desktop Discussions

Help! Regex Tokenize multiple expression

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...