Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx Tokenization Conundrum

SWAJeff
5 - Atom

Hello all,

 

I have spent the better part of an entire day trying to figure this out and any help would be much appreciated!  Ultimately, my quest is pretty simple in concept.  One of our databases has a "junk drawer" memo field where people have been entering waybill tracking information.  I need to tokenize the waybill info into separate rows.  There are two scenarios in which the data exists in the field:

 

1)  R21JUL17100123456789

 

and

 

2) R17JUL17100123456789          R18JUL17100123456789                           R19JUL17100123456789  R20JUL17100123456789

 

And, as you might have guessed, there are no delimiters in scenario 2.  Just the number of spaces the previous shipping agent wanted to put in that day.

 

Here's the expression that has been the most accurate for me so far:  (.+?)(?=R\d{2}[[:alpha:]]{3}\d{2})  Unfortunately, this doesn't catch everything.  The issue I'm running in to is scenario 1.  If there isn't an "RDDMMMYY" format to look forward at then it doesn't return any match.  In essence, i'm only catching scenario 2 rows.

 

I've tried to get really fancy and use a conditional lookup like this:

 

(.+?)(?(?=R\d{2}[[:alpha:]]{3}\d{2}) (?=R\d{2}[[:alpha:]]{3}\d{2}) | (?=$))

 

But I couldn't get it to function properly.

 

 

Workflow Sample.PNGI've attached a photo of a mock workflow, although it is very simple.  For the record, memo fields are the worst.  Thanks in advance for your help!

9 REPLIES 9
MarqueeCrew
20 - Arcturus
20 - Arcturus

I'll get you a solution.... I hope.

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
MarqueeCrew
20 - Arcturus
20 - Arcturus
(\b\w{20}\b)

That's what I used for my expression in the RegEx parse (Tokenize).

 

Capture.png

I hope that this speeds you along towards your weekend.

 

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
SWAJeff
5 - Atom

Thanks, although I should have indicated that the waybill numbers vary from carrier to carrier and are not always 20 chars in length.

MarqueeCrew
20 - Arcturus
20 - Arcturus

No problem!

(\b\w+\b)

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
SWAJeff
5 - Atom

My sincerest apologizes, but I fear I oversimplified the issue.  Attached is a sample set of data to work with.

MarqueeCrew
20 - Arcturus
20 - Arcturus

Before I post again....

 

Capture.png

Does this look right?

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
MarqueeCrew
20 - Arcturus
20 - Arcturus

Jeff,

 

After our brief phone call, here is the ANSWER!!!!

 

Capture.png

 

1.  We changed 5 or more spaces to being a '|' Pipe.

2.  We used a Text to Columns to split the values to ROWS on a '|' delimiter.

3.  We breathed a sigh of relief.

 

Cheers!!!

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
SWAJeff
5 - Atom

Many thanks!  This was a great and SIMPLE solution.

Bob_Blackey
11 - Bolide

@SWAJeff,

 

Glad @MarqueeCrew was able to help you with the solution.

 

Just wanted to add that with the Alteryx Community out there, the phrase "I have spent the better part of an entire day" should never be spoken again!  :-) 

 

Cheers,

Bob

 

 

Labels