Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
RESUELTAS

Splitting data without delimiters, referring to other data in the same column

kwieto
Asteroide

I see today is is the day of asking for help for me :)

I have values in a column, which I want to split for further processing. Part of them is comma separated, but part of them not. Each value is a string, which can contain two words or more.
What I can assume is that for most cases where there are more than one value in a field, I can find other field in the same column where each of the values will be alone, or separated by a comma.

 

Initially I tried to split data into rows simply by comma, list all unique values for the column and filter out longest ones (base assumption: if there is no delimiter, then the field after the split should have the longest strings in it), and then use find-replace to add artificial delimiters to the data. But that doesn't work perfectly as sometimes one single value is longer than combination of two others.
As a workaround I do a split of the data as they are now, aggregate results, look what should be corrected and then put it manually on a source list for find-replace before the split and re-run the workflow.
But I look for something less manual, as the current way is not good for broader variety of data.


Below the sample how data are organized currently:

1Blue Carrot
2Orange Carrot
3Very Big and Delicious Green Potato, Orange Carrot
4Very Big and Delicious Green Potato
5Very Big and Delicious Green Potato Black Berry
6Black Berry, Black Potato
7Blue Carrot Very Big and Delicious Green Potato Black Berry
8Black Berry Very Big and Delicious Green Potato, Blue Carrot
9Black Berry Blue Berry

 

 

And the output should look like that:

1Blue Carrot
2Orange Carrot
3Very Big and Delicious Green Potato
3Orange Carrot
4Very Big and Delicious Green Potato
5Very Big and Delicious Green Potato
5Black Berry
6Black Berry
6Black Potato
7Blue Carrot
7Very Big and Delicious Green Potato
7Black Berry
8Black Berry
8Very Big and Delicious Green Potato
8Blue Carrot
9Black Berry
9Blue Berry
10 RESPUESTAS 10
PhilipMannering
16 - Nebula
16 - Nebula

Hi @kwieto 

 

This seems to work... at least for the provided data set.

PhilipMannering_0-1681744634629.png

 

kwieto
Asteroide

Actually I crafted something similar, but didn't add space to the "find" field - which did the trick (mostly).
The problem is that in the data I have it not always adds commas to all values. There is one or two which this workflow avoid and I don't know why (visually they are identical so I have i.e. "Blue Orange" and "Blue Orange Red Berry" and the comma isn't added with find-replace), I don't know what may cause the problem, as when I put both into the excel and use "search" function, it doesn't have any issues with finding the string.

PhilipMannering
16 - Nebula
16 - Nebula

Could you update your question with an example that doesn't work perhaps?

kwieto
Asteroide

Sure, I've just had to track down the issue first, to not send a list of hundred records 😅

It seems that the issue is when you have overlapping data in the source file for replacement.

It seems that if your original file was like that:

AAA, BBB, CCC, DDD

AAA
EEE, FFF
AAA, BBB CCC DDD EEE, FFF
BBB
CCC
FFF

 

And after the split by comma you have a list (which is your source for replacement) like that:

AAA
BBB
CCC
DDD
EEE
FFFF
BBB CCC DDD EEE

 

Then find/replace will not put commas in every block of "BBB CCC DDD EEE" which it will find
So for example "AAA BBB CCC DDD EEE FFF" will be replaced into "AAA,BBB CCC DDD EEE,FFF" instead of "AAA,BBB,CCC,DDD,EEE,FFF".

If you then split the data into rows and do next round of find/replace then it will clear the remaining part, but you never know how many additional rounds of such operation (split and then replace) you will need until your data will be finally cleared.

The file with the sample data showing the problem is below.

PhilipMannering
16 - Nebula
16 - Nebula

Hey @kwieto 

 

See attached an iterative macro solution.

 

PhilipMannering_0-1681813346692.png

 

kwieto
Asteroide

Seems that it is doing its job, can you guide me a little on how to create such macro?

PhilipMannering
16 - Nebula
16 - Nebula

Yeah, the problem with the original solution is that the Find Replace will always look for the longer string match first. An iterative solution wouldn't be necessary if we could just match the shortest items in the list.

 

So this iterative solution is doing the exact same thing as before, but doing it in a loop repeatedly looking for substrings (and adding that comma) until...

How does it know when to stop? If there is the same number of rows before and after the Text-to-Columns then no further splitting was done. So I make sure all the records go to the regular output (as opposed to iterative output) when [Rows Before] = [Rows After]. An iterative macro will stop iterating when no records are passed to the Iterative Output.

 

You can search the community for more general information on how Iterative Macros work and how to create them.

kwieto
Asteroide

Thanks, I did some research and created a macro which works for me. 
In the meantime I realized that there is also another nuance to cover, but I didn't found a way to do it.
The thing is that sometimes I have two values which are different, but have the same core, i.e. "Blue Berry" and "Blue Berry Flavoured" If they are in the longer string, during the fist iteration they will be separated into two separate strings (as you said, Find/Replace takes the longer string into consideration).

In the second iteration the "Blue Berry Flavoured" will be split into "Blue Berry" and "Flavoured" which I would like to avoid.

I was thinking of some kind of cross check with the first split for the values which don't match (the "Flavoured" will not appear on the list) and then combining them wit the proper beginning (I assume that would be value in the row before the one where not matching value was found), but not sure how to craft it.
Not mentioning, that even such additional clearance will leave some data for manual correction...
But for now, the macro is doing most of the necessary job.

PhilipMannering
16 - Nebula
16 - Nebula

Excellent. Can you mark as a solution?

Etiquetas