(I'm going to find it hard to upload sample data - it's proprietary and large - but will try if no-one can make sense of the below.)
+ I have data - 779k records. Total value: £13m.
+ I summarise - 55k records. Total value: £13m.
+ I run a JOIN. Input:
LEFT: My 55k records
RIGHT: A reference sheet, in which I'm pulling in only BARCODE and PRODUCT TITLE, joining on BARCODE and pulling in the PRODUCT TITLE on a match.
+ The join output:
RIGHT: Irrelevant
LEFT: 290 records, SUMMARISE value £8k
JOIN: 68,618 records, SUMMARISE value £15.5m(!!)
I simply do not understand how this happened. All the JOIN should be doing is pulling in a text value if it matches barcodes. How is it increasing the number of records and the value in the db?
The Join is reading off an Excel - my current theory is that perhaps that reference excel lists the some barcodes twice but with a different text entry for each listing. Would the JOIN then duplicate the record it's matching to in the non-reference document, so it can put a match one with each of the found text entries? (I have checked and this is not the case)