To create a pseudo random sample of data using the In-Database (InDB) tools, this could be a potential solution.
(This example uses Microsoft SQL 2008)
A set of 10 records are used in the workflow and we want find a ‘random’ 50% of the records.
If we were to use the InDB Sample tool and choose to sample 50% of the records we’ll just get the first five records, not very random.
Instead, using the InDB Formula tool we can create a new field, in this case it’s being called RandomID, and use the MS SQL expression NEWID() which will generate a GUID (this expression will probably vary depending on which database is being used).
Now we can feed our data to the InDB Sample tool. Configure the InDB Sample tool to sort by the RandomID field and select the number or percentage of rows to return.
Running the workflow our results look like this, notice the BankerIDs that are returned as compared to the original data
The workflow would look like this:
(A workflow is not included with this article as it requires a specific database connection.)