Hi all
Hope you are well!
I am having trouble finding the optimal way to join four data sets.
The dates are all misaligned (they may fall in the same week, but they will be in a different day within that week)
The item by date row appears 84 times in the first data set, so it's long. (theye are over 2000 SKUs, so 2000 x 84)
The commonality between the 4 data sets are the Item ID, but when I join the data sets I get a lot of null values, and the data set becomes too large (100gb) and slightly misaligned e.g. null values appear with no item ID but then some other fields are complete, even though Item ID is across all four data sets
How would you approach this?
Thanks so much!
Data Set #1 -
Item ID
Date
Unique Data "A"
Other fields...
Data Set #2 -
Item ID
Date
Unique Data "B"
Other fields...
Data Set #3 -
Item ID
Date
Unique Data "C"
Other fields...
Data Set #4 -
Item ID
Date
Unique Data "D"
Other fields...
@MESSED-UP-WORKFLOW
Maybe we want to use the Date as the Join Key as well.
can you provide a dummy data sample indicating Input and desired output?
Can you please provide with the sample data sets for 4 files and what would be your expected output?
Furthermore if you have unwanted null values in datasets, consider removing them before joining.