Alteryx Designer Desktop Discussions

kwieto · ‎04-27-2023

I need some insights on how to deal with following case:

I have data in comma delimited string, which I need to compare with the reference table. That part is easy, use "text to columns" tool to split data into rows, clean trailing spaces if they appear, then use join tool to find values that don't match.
The problem I have is that part of the values contain commas within the value. I've managed that by using find-replace tool which searches for such values before split and put them in quotes. Then the "Text to columns" tool has the option to skip the data in quotes checked, they are preserved.

So far so good, but there are some values which have errors or are outside of the reference catalogue, and if they contain comma, they are split and as a result I have only part of original value listed as incorrect.

See on the example:

Source:

Mushroom, Potato, potato, Carrot, baby, Carrot, Baby, Lettuce, Onion, Onionn, Lettuce, Baby, Tomato

Reference table:

Mushroom

Potato

Carrot, Baby

Lettuce

Tomato

Result of the workflow (not matching data):

potato

baby

Onionn

Baby

What I really want to get is the list like that:

potato

Carrot, baby

Onionn

Lettuce, Baby

I think this can be done with multirow formula, in two ways:

1. Creating manually handled list of values (fortunately for me, such cases are rather from limited catalogue) and if the content of certain row match that value, then combine it with the value of the previous row, and remove the previous row from the list, then compare with the main reference table by join tool.

2. Check the values with the reference table and if the value in certain row is out of the list, and the value in previous row is on the list, then combine value with previous row and remove that row from the list

Both ways need comparing the content of specific row with a list of values, and this probably can't be done by the join tool, as then I will lose the relation to the previous row.

Do you have any idea how to solve that? Or maybe there is a better way to handle such cases?

binu_acs · ‎04-28-2023

@kwieto One way of doing this

Yoshiro_Fujimori · ‎04-28-2023

Hi @kwieto ,

Could you explain how you know "Lettuce, Baby" should be one name, while it is not on the Reference table?

kwieto · ‎04-28-2023

@Yoshiro_Fujimori: The logic is that if the value is not in reference table AND predeceasing value is, then they most probably should have to be combined together.
But after the second thought I think this is the bad approach, as it might be the case that you have two wrong values (from my example: "carrot, baby" and they should be combined.

So creating manually managed second reference table (containing values like "baby") will be probably better way of handling that.

kwieto · ‎04-28-2023

@binu_acs : I tried doing it that way, but you still have part of values split in two rows (see record 2 and 3)
Another weakness is that you have to manually manage the formula (2), and you may encounter several values for replacement (from limited range, but they may be a few different values, depending on the source)

Yoshiro_Fujimori · ‎04-28-2023

Hi @kwieto ,

Thank you for the explanation.

If the number of keywords (like "Baby") is not so many, you may put it into the RegEx like this.

Input

Mushroom, Potato, potato, giant, Carrot, baby, Carrot, Baby, Lettuce, Onion, giant, Onionn, Lettuce, Baby, Tomato

Workflow

RegEx configuration

Output

Is it close to your needs?

kwieto · ‎04-28-2023

@Yoshiro_Fujimori yes, it is. The output of the operation should be then compared with the reference (after removing commas from the reference values) right?

Now the only part which I miss is a handy tool to update the keywords if needed (therefore I thought about the list from the text input), but since it will be a part of the macro, it should be manageable to do that via the text input tool.

One question: why did you put a formula changing the case to the title case?

kwieto · ‎04-28-2023

One more thought: it might be important to preserve original formatting. this is because the values listed as wrong are then sent to other people who are checking and correcting the data manually (each one has to be reviewed separately). It would be easier for them to have them exactly as they are in the file.

In your formula if I have "potato, giant" and "potato giant" they both will be listed as "Potato Giant".
Is it possible to modify Regex to just change comma to other sign (i.e. "|") so after the checking it can be easily converted back to comma?

so if I have "potato, giant" and "potato giant", I'll get "potato| giant" and "potato giant"; both will not match the reference and be listed as wrong, but then I can use simple formula to revert "potato| giant" back to "potato, giant".

Yoshiro_Fujimori · ‎04-28-2023

Hi @kwieto,

> The output of the operation should be then compared with the reference

Yes, I wanted to focus on the parsing (fun part 😀).

> handy tool to update the keywords

I tried it with Dynamic Replace tool as below.

> why did you put a formula changing the case to the title case?

It's just my preference. You can also keep the case if necessary.

> Is it possible to modify Regex to just change comma to other sign (i.e. "|") so after the checking it can be easily converted back to comma?

Instead of "pipe", how about directly replacing "space" with "comma + space" at the end?

Workflow

Output

kwieto · ‎04-29-2023

@Yoshiro_Fujimori wrote:
Yes, I wanted to focus on the parsing (fun part 😀).

😁

@Yoshiro_Fujimori wrote:
I tried it with Dynamic Replace tool as below.

Nice. Better than using the text input tool, as in that solution user would probably need to put words in proper syntax

@Yoshiro_Fujimori wrote:
> Is it possible to modify Regex to just change comma to other sign (i.e. "|") so after the checking it can be easily converted back to comma?
Instead of "pipe", how about directly replacing "space" with "comma + space" at the end?

I see you like to avoid replacing comma to an artificial separator 😀

But this won't work as your revert formula is changing spaces to commas also in places where there was no comma in original string (i.e. "potato, giant" while in the source string there is "potato giant". Also if it will happen that you have more than three words in one value, each will be separated by comma in the output.

Changing space to some other sing solves both issues. I tried with "$1| $2" in the regex formula and it worked. Another possibility may be to put the sequence in quotes, but I don't know how to adapt formula to do that ("'$1 $2'" or "'"+"$1 $2"+"'" didn't work).

Anyway, I think I have what I needed to finish the workflow, thanks!

Alteryx Designer Desktop Discussions

Finetuning unmatched values (from delimited string)

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data

Re: Row creation