Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Need help with RegEx Field Parsing

Bren312
8 - Asteroid

Hi all, I’m hoping someone can help me with some RegEx wizardry.  I’ve got an input file with multiple components I need to parse out into columns.  I’m pretty new to using RegEx so I haven’t had much success.  The attached workflow contains the entire Input data set, as well as a sample of the input and how it should be output.  Hopefully this is enough information to get a solution.  Also, I’m using Alteryx Designer 2019.2. 

 

Thanks everyone!

 

Bren

8 REPLIES 8
vizAlter
12 - Quasar

Hi @Bren312 — It's good to attach sample/dummy data to know more about your Input file and desired output file, both.

 

Your attached file does not have data:

vizAlter_0-1600887486591.png

 

Bren312
8 - Asteroid

Crud; sorry about that!  I just reposted the workflow with dummy info in a Text input so hopefully that fixes it.

David-Carnes
12 - Quasar

@Bren312 !
It took some cleansing to get it to a point where the RegEx wouldn't be stupidly complex.

 

If you have a need for more than two extra rows of unit numbers then I would wrap the Multi-Row Formula tool into an Iterative macro.  If you need, I can help with that, should the need arise.

 

Bren312.png

 

That was an interesting challenge.

 

Good luck!
David

 

 

 

Bren312
8 - Asteroid

@David-Carnes 

 

Hi David; first off you’re a freaking pro and I really appreciate you looking at this (and especially the container name you used).

 

This is super close to what I’m trying to achieve but there’s a few things that need to be tweaked if you’re up for some additional headaches.

 

First, as you said there will be cases where I’ll need more than two rows of unit numbers...an example is rows 17 - 33 from the Input.  The Buying # is the same for these records (“2865-06”) and ideally all of the units (from  “01” in Row 17 to “9006” in Row 33) would be represented in the final output.

 

Also (and this one is weirder) I noticed some odd stuff in the results for Input Row 898 (Record 209 in the final output) for 83784-00 RIGGIO DISTRIBUTION CO such as

  • The Store displayed is 29, which isn’t in this record (the Store should be 65)
  •  The Client; Status; and Unit # Columns are blank
  • Strangest of all, when I filter at the start of the workflow for “Records <900” the Client; Status; and Unit # Columns are populated but when I filter for “Records <901” these Columns are blank...doesn’t make a lick of sense to me.   

 

Like I said, if you feel like letting me steal some more of your time I’d really appreciate it, and if not this will give me a great base.  Thanks again David!

David-Carnes
12 - Quasar

Bren!
I can see one mistake I made; I assumed the Buyer # would be 3 or 4 digits before the dash but "83784-00 RIGGIO DISTRIBUTION CO"  proved me wrong

 

In the first RegEx tool, the expression I used is (\d{3,4}-\d{2})\s+([\w\s]*)\s+(\d{2})\s+([\d-]*)(.*$)

It really should start out with  (\d{3,5}-\d{2}) or better yet  (\d+-\d{2}) if the first set of digits might exceed 5.

 

Also, I did assume the second set after the dash would always have two digits.  If this is an incorrect assumption then change it accordingly.

 

Make the change I mentioned in the second paragraph and see if that fixes the errors for 83784-00.

 

I can address the iterative macro later tonight.

 

Ciao,

David

Bren312
8 - Asteroid

Aces; thanks David!  I was just barely savvy enough in RegEx to change the Buyer # before you mentioned it, but I appreciate you letting me know.  You are correct about the digits after the dash always being two digits.  Thanks again!

 

Bren 

David-Carnes
12 - Quasar

OK!  I ditched the idea of a macro.  All you really need is a concatenation of the following records in certain cases.  So I used the Multi-Row tool to build an identifier of the groups of records then the Summarize tool to concatenate the Unit #s.  I wasn't sure if you would need the original Record field, so I used a separate stream to bring them back into the data.

 

Bren312 part 2.png 

 

Happy hunting and good luck!

Best,

David

 

(edit: added attachment)

Bren312
8 - Asteroid

David, you're a pro!  This worked like a champ.  I really appreciate all of your work on this; you killed it.  Thanks again and much obliged!

 

tenor.gif

 

Bren

Labels