Hi all, I’m hoping someone can help me with some RegEx wizardry. I’ve got an input file with multiple components I need to parse out into columns. I’m pretty new to using RegEx so I haven’t had much success. The attached workflow contains the entire Input data set, as well as a sample of the input and how it should be output. Hopefully this is enough information to get a solution. Also, I’m using Alteryx Designer 2019.2.
Thanks everyone!
Bren
Solved! Go to Solution.
Hi @Bren312 — It's good to attach sample/dummy data to know more about your Input file and desired output file, both.
Your attached file does not have data:
Crud; sorry about that! I just reposted the workflow with dummy info in a Text input so hopefully that fixes it.
@Bren312 !
It took some cleansing to get it to a point where the RegEx wouldn't be stupidly complex.
If you have a need for more than two extra rows of unit numbers then I would wrap the Multi-Row Formula tool into an Iterative macro. If you need, I can help with that, should the need arise.
That was an interesting challenge.
Good luck!
David
Hi David; first off you’re a freaking pro and I really appreciate you looking at this (and especially the container name you used).
This is super close to what I’m trying to achieve but there’s a few things that need to be tweaked if you’re up for some additional headaches.
First, as you said there will be cases where I’ll need more than two rows of unit numbers...an example is rows 17 - 33 from the Input. The Buying # is the same for these records (“2865-06”) and ideally all of the units (from “01” in Row 17 to “9006” in Row 33) would be represented in the final output.
Also (and this one is weirder) I noticed some odd stuff in the results for Input Row 898 (Record 209 in the final output) for 83784-00 RIGGIO DISTRIBUTION CO such as
Like I said, if you feel like letting me steal some more of your time I’d really appreciate it, and if not this will give me a great base. Thanks again David!
Bren!
I can see one mistake I made; I assumed the Buyer # would be 3 or 4 digits before the dash but "83784-00 RIGGIO DISTRIBUTION CO" proved me wrong
In the first RegEx tool, the expression I used is (\d{3,4}-\d{2})\s+([\w\s]*)\s+(\d{2})\s+([\d-]*)(.*$)
It really should start out with (\d{3,5}-\d{2}) or better yet (\d+-\d{2}) if the first set of digits might exceed 5.
Also, I did assume the second set after the dash would always have two digits. If this is an incorrect assumption then change it accordingly.
Make the change I mentioned in the second paragraph and see if that fixes the errors for 83784-00.
I can address the iterative macro later tonight.
Ciao,
David
Aces; thanks David! I was just barely savvy enough in RegEx to change the Buyer # before you mentioned it, but I appreciate you letting me know. You are correct about the digits after the dash always being two digits. Thanks again!
Bren
OK! I ditched the idea of a macro. All you really need is a concatenation of the following records in certain cases. So I used the Multi-Row tool to build an identifier of the groups of records then the Summarize tool to concatenate the Unit #s. I wasn't sure if you would need the original Record field, so I used a separate stream to bring them back into the data.
Happy hunting and good luck!
Best,
David
(edit: added attachment)
David, you're a pro! This worked like a champ. I really appreciate all of your work on this; you killed it. Thanks again and much obliged!
Bren