Do you use Alteryx in a language other than English? If so, we want to hear from you! Please help us improve the international experience of our products by participating in this 5 minute survey.

We are updating the requirements for Community registration. As of 7/21/21 all users will be required to register a phone number with their My Alteryx accounts. If you have already registered, you will be prompted on your next login to add your phone number.

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Random Sample Problem

marlontalisvistar
6 - Meteoroid

Hi,

 

I have a group of data containing invoices and I would like to randomize it by 1%. The invoices are categorize by group for example: Appraisal and Bankruptcy. If in case for a while it does hit below 1%, it will show me 1 record. I'm using the Random % Sample tool however I can't find ways on how to show or to give me 1 record if it is less than 1%.

 

I do hope you can help solve this problem.

10 REPLIES 10
BrandonB
Alteryx
Alteryx

I think that this approach should work well. Basically it finds the count of records in the beginning and then if it is less than 100 records it routes to the bottom stream, randomly assigns an integer, sorts based on this, and then samples the first one which gives you a random single record. If the record count is greater than or equal to 100 and 1% would result in one or more records then it will take a random percent sample. Workflow is attached. 

 

BrandonB_0-1626799477056.png

 

marlontalisvistar
6 - Meteoroid

Hi BrandonB,

 

Thanks for the response and I really appreciate it. However, I have problem producing the data since what comes out is the record number. I am using wildcard * in Input file as source since I need those files to be included. The append tool does not show me the "source" or field names of my input. Can you please help me show how it is done. Instead of Append I used the record ID.

mbarone
15 - Aurora
15 - Aurora

Not sure I fully understand the use-case, but some effective tools for selecting random groups from a dataset that I've used is the Tile Tool in combination with a formula tool using the Rand() function.  For example, if I have X number or records, and want to equally distribute those records evenly amongst 8 groups, I could apply a random number to the records, then tile them into 8 tiles sorting on the random number column to get a random distribution.  Not sure if this helps or gives you any ideas that may help but thought it might.

BrandonB
Alteryx
Alteryx

@marlontalisvistar if you can give me a sample data set (just change any sensitive values) I can mock it up for you. The append is to basically get the number of records as part of the process because it does the regular random sample tool if the number of records is greater than 100 and it does another different process if the number of records is less than 100 which would result in a less than 1 record result with a 1% random sample. 

marlontalisvistar
6 - Meteoroid

Hi Guys, 

 

Thanks for the response. I have attached a sample data for your reference. Here's my conditions:

 

The goal is to get the 1% of the records using random sample tool. We also need to remove duplicate for invoice number if there is since my input data consist of multiple excels wherein I used wildcard to call them.

Sample Scenario:
If I have 200 records multiply by 1% is equal to 2. Then Alteryx will me sample size of 2 records to audit.
if for instance I have 27 records multiple by 1% which is equal to 0.27. It will be round up to 1 to provide me at least 1 record to audit. It there no data, then Alteryx will give me none.
If there is a decimal, it will be round up to become a whole number.

 

I do hope you can help me with this as I am also starting to learn from it.

 

Thanks in advance for the knowledge and assistance.

I have attached sample of my workflow however please be advised that it doesn't give me the results I wanted. I can't get the correct sample size of data. I need to get the sample size for each Invoice type (appraisal, property pres etc) that is why you may notice the filtering tool there. Each invoice types will be output to excel.

 

 

BrandonB
Alteryx
Alteryx

Hi @marlontalisvistar 

 

I updated it to work with your data set. Workflow is attached. 

marlontalisvistar
6 - Meteoroid

Hi Brandon/Guys,

 

Thanks for the solution. Appreciate the "Randomizer v2". While it is working good, can we add this scenario, that when we randomize and resulted to for example 1.2, 3.5 or 4.6, it will round up. For 206 records for example multiply by 1% is equal to 2.06. This will be rounded up to 3, thereby giving me 3 records to audit. I'm trying to use the round(x,mult) but not working.

 

Also, this report is being run, appended and updated everyday. Is it possible to create a report of all of the data I've audited? If I run the alteryx, I guess the randomized data will go to another container (I'm not sure to call it) and appended everyday so I will have a reference or report that I've audited for the month of July.

 

One thing is that why do we use  the value 100 in the count and random record? Will it still work even if I have a thousand of records?

 

Hoping for your response on this inquiry and thank you so much for patiently assisting me on this.

marlontalisvistar
6 - Meteoroid

Thanks for the solution. Appreciate the "Randomizer v2". While it is working good, can we add this scenario, that when we randomize and resulted to for example 1.2, 3.5 or 4.6, it will round up. For 206 records for example multiply by 1% is equal to 2.06. This will be rounded up to 3, thereby giving me 3 records to audit. I'm trying to use the round(x,mult) but not working.

 

Also, this report is being run, appended and updated everyday. Is it possible to create a report of all of the data I've audited? If I run the alteryx, I guess the randomized data will go to another container (I'm not sure to call it) and appended everyday so I will have a reference or report that I've audited for the month of July.

 

One thing is that why do we use  the value 100 in the count and random record? Will it still work even if I have a thousand of records?

 

Hoping for your response on this inquiry and thank you so much for patiently assisting me on this.

BrandonB
Alteryx
Alteryx

The solution that I provided just ensures that if you have less than 100 records passing through that one will pass through. Otherwise it would return zero records. If more than 100 records exist in the data set it goes through the traditional random sampling. From an output perspective, you can choose the append option in the Output Data tool if you want the results to append to your output every time the workflow is done. 

Labels