I need some insights on how to deal with following case:
I have data in comma delimited string, which I need to compare with the reference table. That part is easy, use "text to columns" tool to split data into rows, clean trailing spaces if they appear, then use join tool to find values that don't match.
The problem I have is that part of the values contain commas within the value. I've managed that by using find-replace tool which searches for such values before split and put them in quotes. Then the "Text to columns" tool has the option to skip the data in quotes checked, they are preserved.
So far so good, but there are some values which have errors or are outside of the reference catalogue, and if they contain comma, they are split and as a result I have only part of original value listed as incorrect.
See on the example:
Source:
Mushroom, Potato, potato, Carrot, baby, Carrot, Baby, Lettuce, Onion, Onionn, Lettuce, Baby, Tomato |
Reference table:
Mushroom |
Potato |
Carrot, Baby |
Lettuce |
Tomato |
Result of the workflow (not matching data):
potato |
baby |
Onionn |
Baby |
What I really want to get is the list like that:
potato |
Carrot, baby |
Onionn |
Lettuce, Baby |
I think this can be done with multirow formula, in two ways:
1. Creating manually handled list of values (fortunately for me, such cases are rather from limited catalogue) and if the content of certain row match that value, then combine it with the value of the previous row, and remove the previous row from the list, then compare with the main reference table by join tool.
2. Check the values with the reference table and if the value in certain row is out of the list, and the value in previous row is on the list, then combine value with previous row and remove that row from the list
Both ways need comparing the content of specific row with a list of values, and this probably can't be done by the join tool, as then I will lose the relation to the previous row.
Do you have any idea how to solve that? Or maybe there is a better way to handle such cases?
Solved! Go to Solution.
@Yoshiro_Fujimori
One more question, if you can guide me a little - in this expression from your formula: REGEX_Replace([Source], "(\w+), (baby|giant|sunny)", "$1 $2") , if I understand correctly, the logic is that the word before keyword and followed by comma ( "(\w+),") is marked as a "container" $1, and the keyword itself ( "(baby|giant|sunny)" ) is marked as a "container" $2. Then they are replaced with themselves, but without comma ( "$1 $2" ), right?
My question is how should I modify the (\w+), part if I want to get any string between the keyword and the comma preceding that string?
Currently if I have a sequence like ..., potato yellow, sunny,... I will get "yellow" in $1 and "sunny" in $2. How I should modify the script to have "potato yellow" in $1?
OK, solved, works with following modification: ([^,]*), so the whole formula is: REGEX_Replace([Source], "([^,]*), (baby|giant|sunny)", "$1 $2")
User | Count |
---|---|
106 | |
82 | |
70 | |
54 | |
40 |