Why is performing a JOIN increasing my total DB value?!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
(I'm going to find it hard to upload sample data - it's proprietary and large - but will try if no-one can make sense of the below.)
+ I have data - 779k records. Total value: £13m.
+ I summarise - 55k records. Total value: £13m.
+ I run a JOIN. Input:
LEFT: My 55k records
RIGHT: A reference sheet, in which I'm pulling in only BARCODE and PRODUCT TITLE, joining on BARCODE and pulling in the PRODUCT TITLE on a match.
+ The join output:
RIGHT: Irrelevant
LEFT: 290 records, SUMMARISE value £8k
JOIN: 68,618 records, SUMMARISE value £15.5m(!!)
I simply do not understand how this happened. All the JOIN should be doing is pulling in a text value if it matches barcodes. How is it increasing the number of records and the value in the db?
The Join is reading off an Excel - my current theory is that perhaps that reference excel lists the some barcodes twice but with a different text entry for each listing. Would the JOIN then duplicate the record it's matching to in the non-reference document, so it can put a match one with each of the found text entries? (I have checked and this is not the case)
Solved! Go to Solution.
- Labels:
- Join
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @pupmup
What's happening is that you have more than one matching records per concept in one (or both) datasources.
Remember that Alteryx returns Joined records through the J output, and Unjoined ones through the L & R outputs.
Maybe if you summarize(Group by) or Unique the BARCODE for the R input, and then try the join, maybe you'll get close to what you need.
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
simple example but hope it helps
suppose you have one row on the left value 100 and it has barcode "abc"
on the right you have 10 rows, 3 of which have barcode "abc"
the join on barcode "abc" will produce 3 rows, each having value 100
is there additional field(s) that can be added to the join to make it unique ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @ChillEDog,
For what I understand of your example, the structure of your data is something like this:
What I would do is to Summarize the INVENTORY table, grouping by [barcode] and SUM([units]). That'll give you a unique value for [barcode] with the total of values summed.
It's result is:
Then you can join against your other table.
If I misinterpreted anything of your use case, please let me know, and we'll see it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you. I was already summarising the Inventory by barcode at an earlier stage in the process - the duplication only went away when I added a Unique process to the retail price table, based off barcode.
