Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Compare two files and keep matches of only one file

T_Lina
7 - Meteor

Dear Alteryx community, 

 

I have two files that I want to compare: File A and File B. I want to compare them on one column.

As an output I would like to see all records from File B that match File A on this common column. 

 

I have already used the Join-Tool, but as an output I get the matching records from both File A and File B. 

However, I would only like to see what records matched in File B. 

 

Can you help me with that?

 

Thanks a lot in advance. 

 

BR

 

11 REPLIES 11
Emil_Kos
17 - Castor
17 - Castor

Hi @T_Lina,


You need to use join tool but you need to deselect the columns that you don't want to see:

 

Emil_Kos_0-1619079541546.png

 

marcusblackhill
12 - Quasar
12 - Quasar

Hi @T_Lina !

 

@Emil_Kos solution for sure will solve your question already, but if you want an alternative for join tool, you also can use Find Replace tool.

marcusmontenegro_2-1619093424971.png

 

Link your file B in F anchor and file A in R anchor. 

 

For these 2 you need to point the column of the 2 files you want to compare

marcusmontenegro_0-1619093046553.png

Select to Append fields

marcusmontenegro_1-1619093102814.png

And in the list below, point the same field you want to compare. That way, the output will be the entire file B plus a new column with data only for rows where have same data in both files, after that can include a filter and select that new column with the operator "is not null".

 

The benefit of that way is to be sure you will always have the file B rows as limit, because depending of the relation of your files (e.g. N:N), can possibly duplicate rows but with Find Replace that don't happens.

 

Hope that different way also helps you!

 

Emil_Kos
17 - Castor
17 - Castor

Hi @marcusblackhill ,

Good point about the possibility of getting the duplicates.

I would still go with the Join tool as the Join tool is way more awesome. I want to say that I am joking, but I am not 😛

Having embedded select tool + possibility to see what was joined and what wasn't is fantastic and give you a lot of flexibility, but as you said, your solution would have additional benefits 🙂

Tip for @T_Lina  What I like to do is group the data using summarize tool. If we would group the data by the column that we want to use, we will prevent creating duplicates using the join tool and thanks to that; we will ensure that we will not create duplicate records.

 

I hope that it makes sense. 

marcusblackhill
12 - Quasar
12 - Quasar

And you're totally right @Emil_Kos ! hehehe

 

Join tool do this job much easier, only need to be aware about this duplicate possibility but actually you also have some ways to fix that issue after the join. So the Find Replace is just an alternative way hehehe

Emil_Kos
17 - Castor
17 - Castor

Hi, @marcusblackhill

 

by the way your post about the find and replace tool is awesome 😀

T_Lina
7 - Meteor

Hi @Emil_Kos,

 

When using the join tool and deselect the columns from the file that I do not want to see all values of (file A), I still do not get file B as the base with the additional columns from file A that I am selecting.

What I need is to see the numbers from file B that match with the numbers from file A and then add additional columns from file A for the matching values. So basically an excel vlookup and then append columns... 

 

Something does not seem to be correct right now:

 

1) left join (file A) has 10,773 records

T_Lina_0-1619429462835.png

2) right join (file B) has 12,622 records:

T_Lina_1-1619429540519.png

3) As the join output, I get more than 90,000 records:

T_Lina_2-1619429576382.png

 

 

Do you have an idea what I'm doing wrong? 

 

Thank you!

 

BR
Tina

Emil_Kos
17 - Castor
17 - Castor

Hi @T_Lina,

 

Please check this article in order to understand why this is happening. I highly recommend reading this article as this is one of the most common mistakes that are happening when using the join tool. 

 

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Why-Your-Join-Is-Getting-More-Recor...

 

In short, it looks like in both data set you to have duplicate values. To mitigate this you can use the unique tool in order to show only unique values. Another approach is using the find and replaces tools as this tool is working exactly like vlookup and it will append the first value that you will see in the data set. 


Hope this helps. 

 

marcusblackhill
12 - Quasar
12 - Quasar

Hi @T_Lina !

 

That's the duplication issue I said can happen when you do that by Join, so you will need to do a workaround to have unique keys to join these tables, maybe you can create an artificial key combining fields. 

 

Please, read the article @Emil_Kos pointed, will clarify to you how Join tool works and why that can happen, but long story shorty, is something like that picture I did:

marcusblackhill_0-1619443527768.png

if you have any connection possible with more than 1 row between tables, you will have more results in your output.

 

Try to create an artificial key to do unique connections between all rows or try to use the find replace tool like I posted here before.

 

Hope that helps!

 

Emil_Kos
17 - Castor
17 - Castor

Hi @marcusblackhill,

 

Great post and great visualisation 🙂 

Labels