Compare two files and keep matches of only one file
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Dear Alteryx community,
I have two files that I want to compare: File A and File B. I want to compare them on one column.
As an output I would like to see all records from File B that match File A on this common column.
I have already used the Join-Tool, but as an output I get the matching records from both File A and File B.
However, I would only like to see what records matched in File B.
Can you help me with that?
Thanks a lot in advance.
BR
- Labels:
- Join
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @T_Lina,
You need to use join tool but you need to deselect the columns that you don't want to see:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @T_Lina !
@Emil_Kos solution for sure will solve your question already, but if you want an alternative for join tool, you also can use Find Replace tool.
Link your file B in F anchor and file A in R anchor.
For these 2 you need to point the column of the 2 files you want to compare
Select to Append fields
And in the list below, point the same field you want to compare. That way, the output will be the entire file B plus a new column with data only for rows where have same data in both files, after that can include a filter and select that new column with the operator "is not null".
The benefit of that way is to be sure you will always have the file B rows as limit, because depending of the relation of your files (e.g. N:N), can possibly duplicate rows but with Find Replace that don't happens.
Hope that different way also helps you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @marcusblackhill ,
Good point about the possibility of getting the duplicates.
I would still go with the Join tool as the Join tool is way more awesome. I want to say that I am joking, but I am not 😛
Having embedded select tool + possibility to see what was joined and what wasn't is fantastic and give you a lot of flexibility, but as you said, your solution would have additional benefits 🙂
Tip for @T_Lina What I like to do is group the data using summarize tool. If we would group the data by the column that we want to use, we will prevent creating duplicates using the join tool and thanks to that; we will ensure that we will not create duplicate records.
I hope that it makes sense.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
And you're totally right @Emil_Kos ! hehehe
Join tool do this job much easier, only need to be aware about this duplicate possibility but actually you also have some ways to fix that issue after the join. So the Find Replace is just an alternative way hehehe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @Emil_Kos,
When using the join tool and deselect the columns from the file that I do not want to see all values of (file A), I still do not get file B as the base with the additional columns from file A that I am selecting.
What I need is to see the numbers from file B that match with the numbers from file A and then add additional columns from file A for the matching values. So basically an excel vlookup and then append columns...
Something does not seem to be correct right now:
1) left join (file A) has 10,773 records
2) right join (file B) has 12,622 records:
3) As the join output, I get more than 90,000 records:
Do you have an idea what I'm doing wrong?
Thank you!
BR
Tina
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @T_Lina,
Please check this article in order to understand why this is happening. I highly recommend reading this article as this is one of the most common mistakes that are happening when using the join tool.
In short, it looks like in both data set you to have duplicate values. To mitigate this you can use the unique tool in order to show only unique values. Another approach is using the find and replaces tools as this tool is working exactly like vlookup and it will append the first value that you will see in the data set.
Hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @T_Lina !
That's the duplication issue I said can happen when you do that by Join, so you will need to do a workaround to have unique keys to join these tables, maybe you can create an artificial key combining fields.
Please, read the article @Emil_Kos pointed, will clarify to you how Join tool works and why that can happen, but long story shorty, is something like that picture I did:
if you have any connection possible with more than 1 row between tables, you will have more results in your output.
Try to create an artificial key to do unique connections between all rows or try to use the find replace tool like I posted here before.
Hope that helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
