Hello, i am running two ML models in parallel, but i am trying to make them use two entirely different subsets of data from the main pool of training data
my first idea is to randomly generate one subset, do an outer join with the main pool of training data, and then grab a random subset from that. (like this)

but this solution has 2 problems:
- i am using a very large dataset, and i want to avoid doing this join to cut down on computations
- the second subset is going to be smaller than the first one
my second idea is to use a formula tool to randomly generate a number, and create a filter based on that number. but i need to ensure that each group is equally represented in the sample (my actual data has many groups and subgroups)
is there a simple way to accomplish this?