Alteryx Designer Desktop Discussions

agrawaluk · ‎03-21-2025

I have a csv comma delimited file that I am trying to parse in Alteryx but running into a few edge cases. It has 8M+ rows and 34 columns.

Some of the columns have multiple values that are also separated by comma that I do not want to separate. I cannot put them in quotes, brackets or parenthesis. I have to work with this file.

Input:

Col1

ab,cd,7,6,45,21,gh

Desired Output:

Col1	Col2	Col3	Col4	Col5	Col6
ab	cd	7,6	45	21	gh

I am working with strings mostly here.
I know what column has multiple values (for example col3 here) if that helps.

Thanks in advance!

flying008 · ‎03-22-2025

Hi, @agrawaluk

What's the feature of your columns with don't split ? length or pattern ?

apathetichell · ‎03-22-2025

Hey -> as described this is impossible.

It's usually pretty doable - but the 7,6 is a big issue. unless this is in quotes -> there's now way to specify this is one term vs a two distinct terms without some logic - which you dictate.

are there always going to be numbers with commas in column 3,4 -> are they always one term?

the core thing you'll end up doing is a record id, text to columns split to rows. you'll use tile with unique value for recordid

you'll then use cross tab -> recordid as a key field -> and tile sequence number as the header, and your column as the value. set it first, or contact mode -> doesn't really matter. This will get you there - except the 3,4 column. They'll be in separate columns because Alteryx isn't a mind reader and can't figure out which commas are part of your number term and which commas are delimiters.

agrawaluk · ‎03-24-2025

Sorry, its hard to define pattern or length as the column is more of an 'Item Description' type field with multiple commas and no set field length. However to achieve our goal for even 70%, I can put a number to the string length. Let's say the total length is 9 characters including the comma in between. Example value - "9A-1,9A-2".

Can we do something with this?

agrawaluk · ‎03-24-2025

Agree with you.

The values are not always one term. They are I would say 20% times multiple values.

Thanks for your input. I appreciate it.

AncientPandaman · ‎03-24-2025

@agrawaluk
if fixed field number and only 1 field of desc. it may use split of cutting front and back.

eg:

id, acc, desc, a, b, c

1, 23xx, lambda a, b: a + b, 1, 2, 3

split front

id	acc	desc, a, b, c
1	23xx	lambda x, b: x + b, 1, 2, 3

then split end

id	acc	desc	a	b	c
1	23xx	lambda x, b: x + b	1	2	3

else may use any unique column as anchor in regex,

like column of date, Y/N, or any other fixed length columns

apathetichell · ‎03-24-2025

yeah - without an understanding of the semantic construction of the description field and an understanding of the universe -> I can't tell you if you will get to 70%. My assumption is that the value is extracting some type of information from the description field vs just splitting it. I also assume you can kind of do what you want using an LLM but -> costs. You can try to post some more examples -> you can try to follow my split by text to columns/row mode /tile suggestion above - > and see if that gets you to 70% - if it doesn't it may explain what it is that makes sense/doesn't make sense about your data structure.

Maybe pytorch? https://stackoverflow.com/questions/71552396/extracting-data-logically-from-unstructured-text

https://www.reddit.com/r/LangChain/comments/1d3a3ee/parsing_unstructured_text/

langchain? chatgpt?

Alteryx Designer Desktop Discussions

CSV parsing with multiple values in one column

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data

Re: Row creation