Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Finetuning unmatched values (from delimited string)

kwieto
8 - Asteroid

I need some insights on how to deal with following case:

I have data in comma delimited string, which I need to compare with the reference table. That part is easy, use "text to columns" tool to split data into rows, clean trailing spaces if they appear, then use join tool to find values that don't match.
The problem I have is that part of the values contain commas within the value. I've managed that by using find-replace tool which searches for such values before split and put them in quotes. Then the "Text to columns" tool has the option to skip the data in quotes checked, they are preserved.

 

So far so good, but there are some values which have errors or are outside of the reference catalogue, and if they contain comma, they are split and as a result I have only part of original value listed as incorrect.

See on the example:

Source:

Mushroom, Potato, potato, Carrot, baby, Carrot, Baby,  Lettuce, Onion, Onionn, Lettuce, Baby, Tomato

 

Reference table:

Mushroom
Potato
Carrot, Baby
Lettuce
Tomato

 

Result of the workflow (not matching data):

potato
baby
Onionn
Baby

 

What I really want to get is the list like that:

potato
Carrot, baby
Onionn
Lettuce, Baby

 

I think this can be done with multirow formula, in two ways:

 

1. Creating manually handled list of values (fortunately for me, such cases are rather from limited catalogue) and if the content of certain row match that value, then combine it with the value of the previous row, and remove the previous row from the list, then compare with the main reference table by join tool.

 

2. Check the values with the reference table and if the value in certain row is out of the list, and the value in previous row is on the list, then combine value with previous row and remove that row from the list

 

Both ways need comparing the content of specific row with a list of values, and this probably can't be done by the join tool, as then I will lose the relation to the previous row. 

 

Do you have any idea how to solve that? Or maybe there is a better way to handle such cases?

11 REPLIES 11
kwieto
8 - Asteroid

@Yoshiro_Fujimori 
One more question, if you can guide me a little - in this expression from your formula: REGEX_Replace([Source], "(\w+), (baby|giant|sunny)", "$1 $2") , if I understand correctly, the logic is that the word before keyword and followed by comma ( "(\w+),") is marked as a "container" $1, and the keyword itself ( "(baby|giant|sunny)" ) is marked as a "container" $2. Then they are replaced with themselves, but without comma ( "$1 $2" ), right?

My question is how should I modify the (\w+), part if I want to get any string between the keyword and the comma preceding that string?

Currently if I have a sequence like ..., potato yellow, sunny,... I will get "yellow" in $1 and "sunny" in $2. How I should modify the script to have "potato yellow" in $1?

kwieto
8 - Asteroid

OK, solved, works with following modification: ([^,]*),  so the whole formula is: REGEX_Replace([Source], "([^,]*), (baby|giant|sunny)", "$1 $2")

Labels
Top Solution Authors