Hi,
I've uploaded 2 files in the tool. There are some duplicates between both files. I would like to delete those.
However, the duplicates that are included within 1 file I would like to keep them.
For exemple, here below I would like to keep line 1 2 and 4 and remove line 2.
FILE 1 - XXX
FILE 2 - XXX
FILE 2 - XXX
FILE 1 - YYY
Here below I would like to keep line 1 2 and 5 and remove line 3 and 4.
FILE 1 - XXX
FILE 1 - XXX
FILE 2 - XXX
FILE 2 - XXX
FILE 1 - YYY
Solved! Go to Solution.
I'm not completely sure I understand the difference between the two scenarios but I attached a couple possible solutions.
In the first one, I assign a unique RecordID to all the rows in File 1 and then use the unique tool to keep all of File 1 but remove any duplicates from File 2.
In the second one, I use the Multi-Row tool to identify duplicates.
Hopefully one of these helps!
Hi, thank you for your reply.
Actually, I've extracted the 2 files from a system and when extracting those, I have an overlap.
Meaning that some lines that are included in file 1 are also included in file 2. I would like to remove the overlap items.
If I have 5 similar lines in file 1 and the exact same line appears 3 times in file 2, I would like to remove the 3 lines in file 2 and keep the 5 similar lines in file 1.
If I have 4 different lines in file 1 and those 4 different lines are also included in file 2, I would like to remove the 4 lines in file 2.
I hope this is more clear.
If I understand your requirements correctly, you should be able to create a flag that you can filter on.
I started by sorting the records by file name and values so anything the same would be grouped sequentially. From there, I used a multi-row formula to say if file = 1, keep everything. The second check it does is if the value is the same as the value above, to exclude (value of 0). From there, you can use a filter on flag = 1.
Hope this helps!
Unfortunately this is not working ..
The workflow that I provided works for the examples you gave previously. Can you expand upon what isn't working specifically, and provide additional examples?
In the workflow you've made the input is
FILE 1 XXX
FILE 1 XXX
FILE 2 XXX
FILE 2 XXX
FILE 1 YYY
FILE 2 YYY
FILE 2 YYY
FILE 2 YYY
I would like to obtain an output by removing the red items and the output of your workflow does not give this...
In file 1 I would like to keep all the lines. I would like to add file 2 to file 1 without all the items that are already included in file 1. In the example above, we can see that in file 2 the 2 XXX lines are already included in file 1 and same for 1 YYY line which is already included in file 1 so I would like to remove those.
I'm not quite sure I understand your logic. Why are you keeping the File 2 YYY rows as well (bolded)? There is already a YYY value in File 1, which is why both File 2 XXX rows are excluded if I understand your logic.
FILE 1 XXX
FILE 1 XXX
FILE 2 XXX
FILE 2 XXX
FILE 1 YYY
FILE 2 YYY
FILE 2 YYY
FILE 2 YYY
We should keep the bold items because in file 1, we only have 1 YYY line so we can only remove it once in file 2.
For the XXX lines in file 2 , we can remove both those since there are 2 lines XXX in file 1.