This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
A little bit of background. I built a workflow that was working initially however when i added additional inputs, the output went awry. Some of these inputs have overlapping records. I union'd all the inputs and then put them through the unique tool. However this did not separate the duplicate records at all.
Instead of having ~4k records per project code i now have ~8k records (another weird issue is, is that its not exactly 2x the records anymore its ~2x plus an additional 12 records).
Does anybody know a workaround to this? I tried doing the output as file name per input and it still was not an issue. I ran a unique tool after every join as well and nothing seems to work.
If additional info is needed, please let me know and i will build an example.
All help is appreciated as i've been stuck on this issue for quite some time and i'm all out of options.
@adamorse Please see attached and let me know if it works. I'm dealing with Employee time sheets basically, so some might look duplicate however they are not. The issue is some of these inputs have overlap which is duplicating the records.
Sorry for the very generic fields, let me know if you needed clearer data.
As for the whitespaces @mompermj i filtered those columns out as well as ran each input through a cleanse before the union.
If I select fields as in the screenshot, I get a single record out of the unique output. The unselected fields are all distinct between the two rows of the file just based on visual inspection, so this is the behavior I'd expect from the unique tool. Is this not the behavior you want from the tool? (Is it that one of these records is not correct and should be ignored? If so you'll need some other type of filter, I think, because the unique tool can only check if fields are the same or no. But maybe I'm misunderstanding)
Thanks @adamorse, that makes sense. I found the issue, which was the posting/working/ entry dates were sometimes 1-2 seconds off which was why the unique tool wasn't finding them as duplicates.
However, i think i found an easier method, if its possible.
Below i did a count distinct based on the filename and you can see that the workflow is pulling in multiple inputs per Eng. No. Now is there a way to filter based on an input? I.e., if Number = XXXXXXXXXX and Input = 1, is there a way to only pull the first input? Let me know if you have any questions, thanks.
Could you use a sample tool together with grouping by number and file in the "group by columns" to just select the first one (screenshot, though I don't have the group by column selected because I don't have any inputs going in)?