In case you missed the announcement: The Alteryx One Fall Release is here! Learn more about the new features and capabilities here
ACT NOW: The Alteryx team will be retiring support for Community account recovery and Community email-change requests after December 31, 2025. Set up your security questions now so you can recover your account anytime, just log out and back in to get started. Learn more here
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

generating two random samples that don't share data

Matthew
11 - Bolide

Hello, i am running two ML models in parallel, but i am trying to make them use two entirely different subsets of data from the main pool of training data

my first idea is to randomly generate one subset, do an outer join with the main pool of training data, and then grab a random subset from that. (like this)

1.png

 

 

but this solution has 2 problems:

  1. i am using a very large dataset, and i want to avoid doing this join to cut down on computations
  2. the second subset is going to be smaller than the first one

my second idea is to use a formula tool to randomly generate a number, and create a filter based on that number. but i need to ensure that each group is equally represented in the sample (my actual data has many groups and subgroups)

 

is there a simple way to accomplish this?

1 REPLY 1
CoG
14 - Magnetar

Based on the workflow alone, I would recommend the Formula tool as you suggested, Rand(3) and filter from there. You had no guarantee with the Outer Join of equal group representation, but with a large dataset this should resolve itself anyways.

Labels
Top Solution Authors