Unique Tool not separating correctly


Hi All,


A little bit of background. I built a workflow that was working initially however when i added additional inputs, the output went awry. Some of these inputs have overlapping records. I union'd all the inputs and then put them through the unique tool. However this did not separate the duplicate records at all. 


Instead of having ~4k records per project code i now have ~8k records (another weird issue is, is that its not exactly 2x the records anymore its ~2x plus an additional 12 records).


Does anybody know a workaround to this? I tried doing the output as file name per input and it still was not an issue. I ran a unique tool after every join as well and nothing seems to work. 


If additional info is needed, please let me know and i will build an example. 


All help is appreciated as i've been stuck on this issue for quite some time and i'm all out of options.


Thanks everyone.




Can you give an example of two records in this setup that you believe are duplicates but that the unique tool is treating as unique? That may help pinpoint the problem.


is it possible you have white space or case sensitivity issues?


@adamorse Please see attached and let me know if it works. I'm dealing with Employee time sheets basically, so some might look duplicate however they are not. The issue is some of these inputs have overlap which is duplicating the records. 


Sorry for the very generic fields, let me know if you needed clearer data.


As for the whitespaces @mompermj i filtered those columns out as well as ran each input through a cleanse before the union. 


Thanks everyone for your help


@AustinRiggs94 I'm not seeing an attachment?


Hi Austin,


If I select fields as in the screenshot, I get a single record out of the unique output. The unselected fields are all distinct between the two rows of the file just based on visual inspection, so this is the behavior I'd expect from the unique tool. Is this not the behavior you want from the tool? (Is it that one of these records is not correct and should be ignored? If so you'll need some other type of filter, I think, because the unique tool can only check if fields are the same or no. But maybe I'm misunderstanding)



Thanks @adamorse, that makes sense. I found the issue, which was the posting/working/ entry dates were sometimes 1-2 seconds off which was why the unique tool wasn't finding them as duplicates. 


However, i think i found an easier method, if its possible.


Below i did a count distinct based on the filename and you can see that the workflow is pulling in multiple inputs per Eng. No. Now is there a way to filter based on an input? I.e., if Number = XXXXXXXXXX and Input = 1, is there a way to only pull the first input? Let me know if you have any questions, thanks.



NumberCountDistinct NumberFirst_FileName
20002233441 2000223344Input 1
20004571881 2000457188Input 1
20006625991 2000662599Input 1
20008021182 2000802118Input 1
20008170052 2000817005Input 1
20008295191 2000829519Input 2
20008374581 2000837458Input 1
20008746181 2000874618Input 2
20008880821 2000888082Input 1
20008992993 2000899299Input 2
20009178892 2000917889Input 1
20009352741 2000935274Input 3
20009362791 2000936279Input 2
20009377362 2000937736Input 2
20009677711 2000967771Input 2
30001444551 3000144455Input 2
30001661491 3000166149Input 1
30001674352 3000167435Input 1
30001741772 3000174177Input 1
30001742772 3000174277Input 2
30001751792 3000175179Input 1
30001772642 3000177264Input 1
30001803682 3000180368Input 1
30001838701 3000183870Input 2
30001843722 3000184372Input 1
30001844503 3000184450Input 2
30001914452 3000191445Input 2
30001919222 3000191922Input 2
30001929852 3000192985Input 1
30002033102 3000203310Input 2



Could you use a sample tool together with grouping by number and file in the "group by columns" to just select the first one (screenshot, though I don't have the group by column selected because I don't have any inputs going in)?sample.PNG