Alteryx Designer Desktop Discussions

GdeH · ‎02-16-2021

Hi,

I've uploaded 2 files in the tool. There are some duplicates between both files. I would like to delete those.

However, the duplicates that are included within 1 file I would like to keep them.

For exemple, here below I would like to keep line 1 2 and 4 and remove line 2.
FILE 1 - XXX
FILE 2 - XXX
FILE 2 - XXX
FILE 1 - YYY

Here below I would like to keep line 1 2 and 5 and remove line 3 and 4.

FILE 1 - XXX

FILE 1 - XXX
FILE 2 - XXX
FILE 2 - XXX
FILE 1 - YYY

phottovy · ‎02-16-2021

I'm not completely sure I understand the difference between the two scenarios but I attached a couple possible solutions.

In the first one, I assign a unique RecordID to all the rows in File 1 and then use the unique tool to keep all of File 1 but remove any duplicates from File 2.

In the second one, I use the Multi-Row tool to identify duplicates.

Hopefully one of these helps!

GdeH · ‎02-16-2021

Hi, thank you for your reply.

Actually, I've extracted the 2 files from a system and when extracting those, I have an overlap.

Meaning that some lines that are included in file 1 are also included in file 2. I would like to remove the overlap items.

If I have 5 similar lines in file 1 and the exact same line appears 3 times in file 2, I would like to remove the 3 lines in file 2 and keep the 5 similar lines in file 1.

If I have 4 different lines in file 1 and those 4 different lines are also included in file 2, I would like to remove the 4 lines in file 2.

I hope this is more clear.

Emil_Kos · ‎02-16-2021

Hi @GdeH,

Can you test if this solution works for you?

echuong1 · ‎02-16-2021

If I understand your requirements correctly, you should be able to create a flag that you can filter on.

I started by sorting the records by file name and values so anything the same would be grouped sequentially. From there, I used a multi-row formula to say if file = 1, keep everything. The second check it does is if the value is the same as the value above, to exclude (value of 0). From there, you can use a filter on flag = 1.

Hope this helps!

GdeH · ‎02-16-2021

Unfortunately this is not working ..

echuong1 · ‎02-16-2021

The workflow that I provided works for the examples you gave previously. Can you expand upon what isn't working specifically, and provide additional examples?

GdeH · ‎02-16-2021

In the workflow you've made the input is

FILE 1 XXX

FILE 2 XXX

FILE 1 YYY

FILE 2 YYY

I would like to obtain an output by removing the red items and the output of your workflow does not give this...

In file 1 I would like to keep all the lines. I would like to add file 2 to file 1 without all the items that are already included in file 1. In the example above, we can see that in file 2 the 2 XXX lines are already included in file 1 and same for 1 YYY line which is already included in file 1 so I would like to remove those.

echuong1 · ‎02-16-2021

I'm not quite sure I understand your logic. Why are you keeping the File 2 YYY rows as well (bolded)? There is already a YYY value in File 1, which is why both File 2 XXX rows are excluded if I understand your logic.

FILE 1 XXX

FILE 2 XXX

FILE 1 YYY

FILE 2 YYY

FILE 2 YYY

GdeH · ‎02-16-2021

We should keep the bold items because in file 1, we only have 1 YYY line so we can only remove it once in file 2.

For the XXX lines in file 2 , we can remove both those since there are 2 lines XXX in file 1.

Alteryx Designer Desktop Discussions

Duplicates