Do you use Alteryx in a language other than English? If so, we want to hear from you! Please help us improve the international experience of our products by participating in this 5 minute survey.

We are updating the requirements for Community registration. As of 7/21/21 all users will be required to register a phone number with their My Alteryx accounts. If you have already registered, you will be prompted on your next login to add your phone number.

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Help Removing Files with Duplicate Information Under Different File Name

kkkim
6 - Meteoroid

Hi, Community:

 

I am inputting multiple files with same schema into my workflow. I need to find a way to remove duplicate files where the file names are different but data in the file is the same. Is there a way in Alteryx to do this?

 

So even though the file name appears differently, content of the file is exactly the same between two. I can't put a unique as not all columns/rows are unique - meaning data can be repeated throughout the file.

 

Any ideas? Thank you so much for your help!

 

Rgds,

Kate

4 REPLIES 4
Jean-Balteryx
14 - Magnetar

Hi @kkkim,

 

Do you have sample files or a sample workflow ?

kkkim
6 - Meteoroid

Thanks for your response. I am afraid I can't share the actual data but basically each file contains numeric data that can repeat itself. But I am having issues as source file itself is sometimes duplicated (so same record for file twice) but with different file names.

 

Thank you for your help!

Jean-Balteryx
14 - Magnetar

So it would have same number of rows, same columns and same content in cells ?

Elias_Nordlinder
8 - Asteroid

Hello @kkkim 


I understand that you cannot use Union and Unique because of the data itself repeats itself? 
(Solution 2 in the attached workflow).

 


But what about using a join between the two file inputs and keeping the output from the joined files?
(See Solution 1 below).

 

If the two files have the same schema and content the result should only be data from one of the input files.

I attached the workflow below as well.

 

Example:

 

//Elias

 

Elias_Nordlinder_0-1626976180691.png

 

Labels