Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

How can I match 2 datasets without having duplicates in the output

123indo
6 - Meteoroid

Hello!

 

I want to match 2 datasets with multiple fields for matching criteria, and there are several records with same matching criteria in both datasets. However, I don't want to have duplication of records in the output. (i.e., if the records has been matched with other records in other dataset, it shouldn't been used for matching with other records in other dataset)

 

For your ease of reference, I have attached the sample of data workflow.

 

I have tried to find the solutions from the existing discussion, but couldn't find any. Really appreciate if any of you could help me here! This issues has been appearing in several workflows that I am creating.

 

Thank you!

6 REPLIES 6
PhilipMannering
16 - Nebula
16 - Nebula

I think you just want to use a Unique Tool or Summarize Tool to remove duplicates before your join,

PhilipMannering_0-1647615970238.png

 

123indo
6 - Meteoroid

@PhilipMannering thanks for your reply! That will do if I don't need details per records level. However, I need to match the records side by side (not grouped by). Do you have any other solutions?

PhilipMannering
16 - Nebula
16 - Nebula

@123indo Can you provide the output you expect for the dummy data you provided in your example?

SoccerTil
8 - Asteroid

if you want to single join the data sets then make them Unique. Then you can Union the duplicate and unjoined rows as you need them. I used my own test data and brought all rows back into the flow with the Union. Let me know if there are any additional requirements that need to be addressed.

 

SoccerTil_0-1647879179768.png

 

123indo
6 - Meteoroid

I have put my expected output on the right hand side of my workflow. I hope it clarifies.

123indo
6 - Meteoroid

Yes, that will do for part of issues, but the output in the Duplicates (D) still need to be matched again.

 

For example, in my workflow earlier, I have add Unique before Join tool, but records bb100 in dataset 1 and dataset 2 which are the output in the Duplicates (D) are not yet matched:

 

1.JPG

72b26d69-44c8-4126-aec3-257387319b8e.jpg

 

I know I can add another Join tool for this. But the issues will arise if I don't know how many records can still be matched in the Duplicates (D) output. In the above example, there is only bb100 that is matched, but if I have thousands of rows data input, I wouldn't know how many more records that can still be matched - how many Unique and Join tools that I have to add.

 

I wonder if I can use other tool that are looping the formula to match, regardless the number of the records that is matched, without creating duplication of each dataset's records.

Labels