Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Building an analytical application using the Interface tools to select categorical samples

DeorajK
7 - Meteor

Hello,

I'm building an analytical application using the Interface tools to select samples based on categorical data for equal distribution and proportionate distribution within each categories.

 

E.g.,

In the attached dataset, there is a categorical field "Test_Name". In this field there are 4 categories APP BOR ID, LASTUSEDDATE, PROFILE ID,USER FULL NAME.

 

EQUAL DISTRIBUTION

When a user select this field ("Test_Name") with a sample size of 25, the sample should be distribute across all four categories as :

 

BORID = 6

LASTDDATE = 6

PROFILE = 6

FULL NAME = 7 (this will be the plug since the sample size is an odd number 25)

 

If the sample size is 20 then the distribution across all four categories should be :

 

BORID = 5

LASTDDATE = 5

PROFILE = 5

FULL NAME = 5

 

Also, if a category does not have enough records for an equal distribution , then the maximum record should be used and the remainder added to the other categories e.g.,

 

BORID = 2 (only have 2 records)

LASTDDATE = 5

PROFILE = 5

FULL NAME = 8 (remainder added to this category which has more records than the other categories)

 

I would also like to apply the same principles for proportionate distribution using %.

 

Please let me now if you can help me with a sample workflow to achieve these requirements. I have to complete my project before Friday Jan 19th, 2024 and I would greatly appreciate if you can provide me with a solution before then.

 

I have attached a sample data file which you can use.

 

Please let me know if there is anything else you need me to provide.

 

Thanks,

Deoraj.

2 REPLIES 2
gawa
15 - Aurora
15 - Aurora

hi @DeorajK 

I understand your requirement that you want to allocate number to each categorical value in equal percentage as much as possible.

Can you try to run the attached WF if it meets your requirement?

What I did was to put tile sequence number by Tile tool based on categorical value, sort data by tile sequence number by ascending order, and sample first xx records.

image.png

 

By doing so, if you specify sample records 100, 25 records is equally allocated to all categories.

image.png

On the other hand, if you specify sample records 2,000, 'BORID' and 'FULL NAME' will not have 500 records but instead will have 471 records, leaving remainder to other categories.

image.png

DeorajK
7 - Meteor

Hello Gawa,

Thank you very much for your response. The workflow looks good, but I would like the sample population to be random (e.g., randomly select 10% of the population) and the option to select the categories.

I can use all four categories or I can select the categories I would like to use. e.g., BORID and FULL NAME.

 

If you can make these enhancements, that would be perfect! 

 

Thanks again for your help!

Deo.

Labels