Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Finetuning unmatched values (from delimited string)

kwieto
8 - Asteroid

I need some insights on how to deal with following case:

I have data in comma delimited string, which I need to compare with the reference table. That part is easy, use "text to columns" tool to split data into rows, clean trailing spaces if they appear, then use join tool to find values that don't match.
The problem I have is that part of the values contain commas within the value. I've managed that by using find-replace tool which searches for such values before split and put them in quotes. Then the "Text to columns" tool has the option to skip the data in quotes checked, they are preserved.

 

So far so good, but there are some values which have errors or are outside of the reference catalogue, and if they contain comma, they are split and as a result I have only part of original value listed as incorrect.

See on the example:

Source:

Mushroom, Potato, potato, Carrot, baby, Carrot, Baby,  Lettuce, Onion, Onionn, Lettuce, Baby, Tomato

 

Reference table:

Mushroom
Potato
Carrot, Baby
Lettuce
Tomato

 

Result of the workflow (not matching data):

potato
baby
Onionn
Baby

 

What I really want to get is the list like that:

potato
Carrot, baby
Onionn
Lettuce, Baby

 

I think this can be done with multirow formula, in two ways:

 

1. Creating manually handled list of values (fortunately for me, such cases are rather from limited catalogue) and if the content of certain row match that value, then combine it with the value of the previous row, and remove the previous row from the list, then compare with the main reference table by join tool.

 

2. Check the values with the reference table and if the value in certain row is out of the list, and the value in previous row is on the list, then combine value with previous row and remove that row from the list

 

Both ways need comparing the content of specific row with a list of values, and this probably can't be done by the join tool, as then I will lose the relation to the previous row. 

 

Do you have any idea how to solve that? Or maybe there is a better way to handle such cases?

11 REPLIES 11
binuacs
20 - Arcturus

@kwieto One way of doing this

binuacs_0-1682668354943.png

 

Yoshiro_Fujimori
15 - Aurora

Hi @kwieto ,

Could you explain how you know "Lettuce, Baby" should be one name, while it is not on the Reference table?

kwieto
8 - Asteroid

@Yoshiro_Fujimori: The logic is that if the value is not in reference table AND predeceasing value is, then they most probably should have to be combined together.
But after the second thought I think this is the bad approach, as it might be the case that you have two wrong values (from my example: "carrot, baby" and they should be combined.

So creating manually managed second reference table (containing values like "baby") will be probably better way of handling that.

kwieto
8 - Asteroid

@binuacs : I tried doing it that way, but you still have part of values split in two rows (see record 2 and 3) 
Another weakness is that you have to manually manage the formula (2), and you may encounter several values for replacement (from limited range, but they may be a few different values, depending on the source)

 

kwieto_0-1682669415949.png

 

Yoshiro_Fujimori
15 - Aurora

Hi @kwieto ,

Thank you for the explanation.

If the number of keywords (like "Baby") is not so many, you may put it into the RegEx like this.

 

Input

Mushroom, Potato, potato, giant, Carrot, baby, Carrot, Baby, Lettuce, Onion, giant, Onionn, Lettuce, Baby, Tomato

 

Workflow

Yoshiro_Fujimori_3-1682670800547.png

 

RegEx configuration

Yoshiro_Fujimori_1-1682670754470.png

Output

Yoshiro_Fujimori_2-1682670783962.png

 

Is it close to your needs?

kwieto
8 - Asteroid

@Yoshiro_Fujimori  yes, it is. The output of the operation should be then compared with the reference (after removing commas from the reference values) right?

 

Now the only part which I miss is a handy tool to update the keywords if needed (therefore I thought about the list from the text input), but since it will be a part of the macro, it should be manageable to do that via the text input tool.

 

One question: why did you put a formula changing the case to the title case? 

kwieto
8 - Asteroid

One more thought: it might be important to preserve original formatting. this is because the values listed as wrong are then sent to other people who are checking and correcting the data manually (each one has to be reviewed separately). It would be easier for them to have them exactly as they are in the file.

In your formula if I have "potato, giant" and "potato giant" they both will be listed as "Potato Giant". 
Is it possible to modify Regex to just change comma to other sign (i.e. "|") so after the checking it can be easily converted back to comma? 

so if I have "potato, giant" and "potato giant", I'll get "potato| giant" and "potato giant"; both will not match the reference and be listed as wrong, but then I can use simple formula to revert "potato| giant" back to "potato, giant".


Yoshiro_Fujimori
15 - Aurora

Hi @kwieto,

 

> The output of the operation should be then compared with the reference

Yes, I wanted to focus on the parsing (fun part 😀).

 

handy tool to update the keywords

I tried it with Dynamic Replace tool as below.

 

> why did you put a formula changing the case to the title case? 

It's just my preference. You can also keep the case if necessary.

 

> Is it possible to modify Regex to just change comma to other sign (i.e. "|") so after the checking it can be easily converted back to comma?

Instead of "pipe", how about directly replacing "space" with "comma + space" at the end?

 

Workflow

Yoshiro_Fujimori_3-1682687845068.png

 

 

Output

Yoshiro_Fujimori_2-1682687784924.png

 

kwieto
8 - Asteroid

 


@Yoshiro_Fujimori wrote:

Yes, I wanted to focus on the parsing (fun part 😀).

😁

 


@Yoshiro_Fujimori wrote:

I tried it with Dynamic Replace tool as below.

Nice. Better than using the text input tool, as in that solution user would probably need to put words in proper syntax

 


@Yoshiro_Fujimori wrote:

> Is it possible to modify Regex to just change comma to other sign (i.e. "|") so after the checking it can be easily converted back to comma?

Instead of "pipe", how about directly replacing "space" with "comma + space" at the end?

I see you like to avoid replacing comma to an artificial separator 😀

But this won't work as your revert formula is changing spaces to commas also in places where there was no comma in original string (i.e. "potato, giant" while in the source string there is "potato giant". Also if it will happen that you have more than three words in one value, each will be separated by comma in the output.

Changing space to some other sing solves both issues. I tried with "$1| $2" in the regex formula and it worked. Another possibility may be to put the sequence in quotes, but I don't know how to adapt formula to do that ("'$1 $2'" or "'"+"$1 $2"+"'" didn't work).

Anyway, I think I have what I needed to finish the workflow, thanks!

Labels