Hey everyone,
I have a large dataset that I use to generate samples using a random seed. For efficiency, I prefer to run the workflow using the AMP engine. However, I've encountered an issue: with multi-thread processing, the sample often changes each time the workflow is executed because the order of the records is altered.
I considered adding a Record ID tool at the start of the workflow, but I believe this would be ineffective if the input order changes when the files are brought into the workflow. Another idea I've developed is to create two workflows: the first would input the data and add a Record ID without using the AMP engine and then output this to be used in the second workflow, which would utilize the AMP engine.
I wanted to get your thoughts on whether there might be a more efficient solution that would still ensure the sample remains consistent across runs.