I'm currently doing a join based on a customer ID. There is 1 row of Customer #1 on the left input, and 1 row of Customer #1 on the right input. I am using customer ID to join. However, somehow on the JOIN output, it creates 8 records for Customer #1, and only 1 of those rows is accurate (the rest are filled with zeroes).
Why/how is this happening? And how do I fix??
is it possible that there are actually 8 instances of customer 1 in one of your joins - perhaps with nulls() or something in the rest of the field? It would seem quite odd otherwise. Can you maybe do a count of that value prior to the join? The most logical explanation is that one of your data sources as 8 entries, or one has 4 and the other has 2.
@atcodedog05 @Jean-Balteryx I can't share the data...but setup is as follows:
Summarize tool gives me Customer ID, Customer Name, Category, and 3 amount buckets. I then do a cross tab to create columns for the 2 possible categories within each $ bucket. I then do a multiple join to bring them back together. So, at this point I now have 1 record for Customer #1, with 6 columns with $ amounts (let's call them Customer Amount A, Customer Amount B, Customer Billing A, Customer Billing B, Customer Quoted A, Customer Quoted B) for the LEFT input.
The RIGHT input is Customer ID, Customer Name, and one amount bucket (not split out by category). There is ONE row for Customer #1 coming into the join.
I then join by Customer ID, and it spits out 8 rows.
@apathetichell I've already confirmed there is only 1 row for the customer ID on each side of the join.
Have you tried adding a unique tools after each join input to test if it produce a different result ?
something is going on with your multiple join and you are creating additional records. Sorry I know you think I am wrong and that this isn't the case but too many records coming out of join are a symptom of multiple records from one of the streams with whatever you are joining on. Without seeing your data I can't pinpoint it, but I'd suggest dropping the multiple join and using a common join and monitoring what is coming in in the output...
Hi @ewall
What you're describing here is very unusual. In standard cases Join cases, when you have only one record on either side you only get one one matched record coming out of the join. Since this is very unexpected behavour, the only way that we will be able to debug it is to be able to reproduce the error on our side. I understand that you may have confidentiality restrictions around posting your original data, but can you post a couple of dummy data table that produces the same result?
Dan
@apathetichell good idea. I'll try using multiple common joins to see if I can isolate anything.
In the meantime, leveraging an indexed column and the unique tool, I seem to have band-aided the problem for my immediate need but will work on this suggestion!