Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
8 - Asteroid

The Sample tool is one of my favorites, but it is missing 1 crucial feature that is periodically asked for. That is the ability to return a random set of records that is the same amount every time. It seems very simple, but it is actually quite complex. In order to really return a random sample of records you really have to assign a random # and then sort. There is no way to do it in a single pass. Rather than make the Sample tool that much more complex, I built a macro.



It's very simple really. You have a choice between a set # of records or a set % of records. The macro just assigns a random # to each record and then sorts and does a sort and a First N (or N%). It also puts a RecordID on each record at the beginning and then sorts at the end on the RecordID to preserve the original order.


The final feature of the macro is the Deterministic Output. Some people have requested the ability to get a random set of records, but to get the same set each time they run it. If you pick this option, as long as the random seed is the same and the # of records on the input is the same, you will continue to get the same records out. Since Alteryx doesn't have the ability to set a random seed, I used a MD5 hash of the record # and the seed in order to get a random sort order that was deterministic.

This macro ships with current Beta's and will be in 5.0.