I would like to permute my rows in a pseudorandom order, but according to a seed


I would like to rearrange the rows of my data, but I would like to do so in a way that if I re-run the workflow at a later time then the rows will come out in the same order. One way I would like to do this is to use the rand() function to assign each row a random value and sort by that, but to be able to seed the random number generator. Without seeding the random number generator, every time the workflow is re-run the file will come out in a new order, and this is not my desired behavior.

Have you considered using RECORD ID after the first sort? Save the resulting data set to a file that would serve as your index for later executions. Use the JOIN tool to bring the index into the workflow.


There's an old blog article about accomplishing something like this via a sample macro (from back in 2009!).


Essentially, the approach seems fairly straightforward - using a record id (presorted) and a Hashing algorithm applied to a seed, generate a "random" selection, that can be reused as needed.