Hi there. I'm a new user, working my way through the interactive lessons and attempting some workflows as I go. I'd appreciate help on the following:
I have 2 data sources:
The first contains a column called Multimedia IRN with an incomplete sequence, and including duplicate values. Each Multimedia IRN has data in the corresponding Concat_CollCollectionName column.
The second contains a full sequence of Multimedia IRNs, again, including duplicate values (which should match the duplicate values in the other spreadsheet).
The goal is to have the data combined so that I have the full sequence of Multimedia IRNs, with data in a Concat_CollCollectionName column, where it exists for a Multimedia IRN. Where there is no Multimedia IRN corresponding to a value in Concat_CollCollectionName column, the Concat_CollCollectionName column cell should be blank.
I've added the 2 input files to this workflow. I'm assuming it's either Join, Union or may Summarize that would solve this?
Thanks!
Steph
@stephroberts11 Can you upload the input files?
@stephroberts11 you can do the join and union the data to get your output
Thanks, this is very close and I will certainly use it for my learning.
There is an issue with the number of rows in the output however. There are 101599 rows in the full MM IRN list and the export has 100698 rows (therefore 862 numbers missing from the full sequence). I can see that this is probably something to do with the Join configuration but not sure what exactly. Any thoughts? I'm guessing it could be to do with the duplicate values?
@stephroberts11 Before join too i used the summarize tool to remove the duplicate entries from the Full MM list which was around 25K records, if you join with the duplicate records you will not get the expected result,.
Are you expecting any MM irn which was not in the output file?