Join for a beginner
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi everybody,
new user on alteryx, I try to combine two data sets with a join tool. The problem is that after that, I have 2 ou 3 times records more.
I combine with one field because I have juste one field in common.
I post my workflow iff you have any suggestion.
Sorry my question is maybe very simple...
Thanks
Solved! Go to Solution.
- Labels:
- Data Investigation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @remibrun ,
I think, the problem is that in the file "PromoSaleItemsReport.csv" there are multiple rows with same CnkNr and ArtName (e.g. repeated by PromoDate). What happens is that each rows with CnkNr x from LDV2018clean.csv/LDV2019clean.csv is joined to all rows with the same CnkNr in file PromoSaleItemsReport. You could e.g. include additional fields in join (maybe DateSale and PromoDate??) or add a Unique tool to get only unique CnkNr/ArtName from PromoSaleItemsReport.csv. What do you think?
Best,
Roland
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @remibrun
I would recommend creating a unique identifier to join on. Concatenate 2 or 3 columns together using a formula tool to generate one perfectly unique column and use this to join the data together.
Repeating columns is a sign of data duplication and unique IDs are the best way to eliminate them.
In your formula tool, add a new column and then it's as simple as [Column 1] + [Column 2] + [Column 3]
Hope that helps,
Dane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
First thanks for you answer.
yes that's the problem. I have one data set with all of my sales lines (for one store) et I have one data set with more information for each CNK (unique identification of one product).
I would like to join the two data sets to have one big data file (with all the sales lines and the information cnk for each line). Normally I would have the same numbers of lines and more fields?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @remibrun ,
If you CNK is your unique identifier, then this is the only thing that needs to be used in a join.
Adding in the other field, if it's not unique, can lead to issues with duplication.
Try your workflow joined just on CNK.
Hope it helps,
Dane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @remibrun
Also, if your CNK and ArtName are unique together, you can remove duplicates with the Unique Tool as mentioned by @RolandSchubert .
I've attached your workflow with the unique tool added to the inner join.
Also, good practice is to try avoiding overlapping lines where possible so that you can follow the flow a little easier. Try moving tools around to help keep understanding clear.
Hope this helps,
Dane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks, I'll test that tomorrow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi
after trying, I realized that I have to use unique tool like you propose before the job,. I have to be sure that in my file (with the CNK and the promotion) I have just one time each cnk.
Is it possible with the tool unique?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @remibrun
Sure it is, you would just need to attach an unique tool to both of the streams that go into the join.
Place a unique tool after each of the selects and then have the unique output leading to the join.
Hope that helps,
Dane.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I need to have all CNK one time.
Do you think by using summarize tool (group by cnk) it will give the good data with unique CNK?
Thank you so much
![](/skins/images/18F3BF3EB54AD3C2739B5AA9B77A7F97/responsive_peak/images/icon_anonymous_message.png)