I am having some peculiar behaviour with grouped regex parsing a concatenated cell with semicolon delimiters from a spreadsheet and into a new spreadsheet.
I have attached the file in case there is supposed to be different settings, regex definitions, etc.Here is the regex section:
And finally the output spreadsheet section:
I need the concatenated cell to be parsed out until it is empty regardless of how long the concatenated strings are.
I am attaching the alteryx file and the input and the output spreadsheets.
Thank you in advance for you assistance :)
UPDATE:
WOW the community here is amazing - - thank you @Bob_Blackey and @Philip - - the assistance is greatly appreciated.
I see I did not give enough detail in the original post, my apologies...
The data is concatenated without nulls (and without the delimiters that should mark nulls).
The data will always be in the format of the eight fields shown in the regex.
Other than the contact name, any field may (or may not) be missing at any time, so correcting the regex definition (below) should minimize the processing steps since it will produce one record at a time.
(\^.+$);(\^.*$);(\^d+\s);(^.*,\s\d{5}$);(\^\(\d{3}\)\s\d{3}\-\d{4}\(p\)$);(\^\(\d{3}\)\s\d{3}\-\d{4}\(m\)$);(\^\(\d{3}\)\s\d{3}\-\d{4}\(f\)$);(\^\w+\@\w+.\w+$)
I have semicolons between the named groups now and it is not working.
Is there something else I should be putting between the named groups in the regex line (to denote the semicolon delimiters) while the concatenated data is being parsed?
Also, is there a way to debug where the regex is failing within this multi-field parsing module?
Hope that clarification helps...
Solved! Go to Solution.
Hi,
nothing like a good parsing problem.
The challenging part is that it looks like you can having a varying number of contacts for each vendor.
One thing to keep in mind is that Alteryx is designed to deal better with rows than columns (there are more row focused tools)
Try using the text to column tool with the semicolon as the delimiter and use the Split to Rows options.
You can then use the multirow formula tool (or probably several) to id the pieces of each contact.
Cheers,
Bob
Hi,
Just have a little time but I'll here's my five cents.
Two things:
1) I would remove all of the $ and ^ as they refer to the start and the end of the line so you don't need them at all (quick glance). Also \^ refers to the actual ^ char and you don't have those.
2) Build it piece by piece:
start with
(.+?);(.+)
for example:
That will pull out the company name in one field and the rest in another.
Then add to that.
Looking at your data I see a mix of addresses and phone #s as the next items so I'm not sure if this data is correct. But if your data is always in the same format you should be able to build it out.
But again, keep adding one section to the parse and use (.+) at the end to capture the rest.
Hope that helps, got to run.
Bob
Thank you Philip, I was able to get the first record of each row to parse with your method.
The later records on the same row fail when there is another regex variation, but I am not going to give up...
Let us know if we can help in any way.
Ok I am getting a lot of "leftovers" in the trimmed fields when the regex does not match exactly, but I am making progress.
I am now attempting the @Bob_Blackey suggestion.
I have the text to column tool using the semicolon as the delimiter and have confirmed no missing data.
Since the field in each row can be anything at this point (same as before, but in rows) - - how do I apply the regex to the multicolumn tool?
Is there a way to consider a set of only eight rows at a time (to contain the possible eight data fields of a single record) yet have the row being matched "shift" down when there is not a regex match? I am looking through the various examples and am not seeing anything structured this way.
Thanks
@Bob_Blackeys suggestion is a good place to start - since you have varying numbers of columns to be output for each record (because you have multiple vendors to parse out for each row), you need to use a text to columns with 'parse to rows' so that you can dynamically Crosstab back together later in the workflow. Once in rows, I would use regex within a Formula tool as a conditional statement to create a field for the Headers so you can pivot the data downstream. Here's my regex:
if REGEX_Match([vendor_detail], "^\d+[-\s]+\w+.+") then 'Address' elseif REGEX_Match([vendor_detail], "\(*\d{3}[\s\)]+\d{3}-\d{4}[\sX\d]*(.p.+|.f.+|.m.+)*") then 'phone' elseif REGEX_Match([vendor_detail], "[\w\s]+,.+\d{5}") then 'City_State' elseif contains([vendor_detail],"@") then 'email' elseif REGEX_Match([vendor_detail], "(www\..+|.+\.net|.+\.com)") then 'Website' else Null() endif
Then the tricky part is grouping the appropriate records back to each other - I did this with a series of multirow formula tools:
Truly impressive GENIUS - - thank you Sophia - - I am very grateful...
How did you get the Select field types and definitions into the Select module? I exported yours, modified it, then reimported, but will need to know how do it on my own in the future. Are you typing it out by hand or is there a template somewhere?
Also, how did you get the Regex to match perfectly? Is there a specific utility you are using?
Thank you so very much Sophia, you are AMAZING!
Richard