Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Parsing One Long Line - Regex

Highlighted

Hi all,

 

I'm working on a part of a workflow where I want to chop up a single column of text separated by titles (ie "Name: XXXX Address: YYYY DoB: ZZZZZ....") into multiple columns for multiple rows.
 
The issue I'm running into is in circumstances where one record is missing fields that others have. I'm a little new to regex and have been trying to brute force it with an expression similar to:

NAME:(.*)ADDRESS(.*)DoB(.*)
 
But of course, if a record is missing an address, the expression will not populate any of the columns. Any ideas on how to correct this issue?
Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

@TaxCodesandRegex,

 

I don't know if I have over-engineered this challenge.  It is dynamic though.  I don't know how many fields are potentially present in your data, the only thing that I take as a given is that the field name is followed by a colon and then by a space.  It is a quick fix to remove the space requirement.  Other than that, I create two streams of data to parse Names and Field Values.  There is a little magic of parsing and trimming followed by a join and a cross tab.  The order of fields in the output is alphabetical.  If there is a controlled order that you want, you can govern that (let me know if there is a need) too.

 

Capture.png

Do investigate the attached workflow and hopefully this solves your challenge need.  Although simple in the scheme of things (8 tools, 6 unique tools) to get the output from your input, I didn't see a one-tool approach to getting your desired results.  I started this post before dinner, and conceived the solution over a chili cheese dog.

 

Happy Pythagorean Day (8^2 + 15^2 = 17^2)!

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Highlighted

Thanks Mark! (I'll have to wait another 3 years and change to get you back on the next pythagorean day..)

 

The workflow is great and definitely contained the right amount of engineering. The only issue that I'm running into is titles with more than one word in the field name ('Date of Birth' instead of DoB) which throws off the expressions. What I have in mind is just creating a long expression with each of the field names manually listed, but I'm not sure how to go about it (or if it's possible). Of course, I'm open to any and all solutions.

 

Edit: Found my own solution, just needed to think outside the regex box. Compiled a list of titles in excel and ran a find/replace to swap multi-word titles with one word counterparts. Thanks again Mark!!

Labels