Alteryx Community,
I have a file with 1,000 records in it and I need to split it up into 4 sections - Package A, Package B, Package C and Package D.
At the same time i need to make sure that 2 fields in my data ([FILENAME] and [STATE]) get as close as possible to an even split between all 4 packages.
So [FILENAME] would be 1st Priority and [STATE] would be second priority.
I started with a sort tool - but can't seem to find the right tool to do the NTH'ing and update my [PACKAGE] field.
Thoughts - Recommendations?
Thanks again for all the help.
Javier
Solved! Go to Solution.
Hi @jdelaguila,
This is one of my favorite uses for the Tile tool. If you're ok with mostly even groups and group 1 having the most unique values, Tile is a quick and easy option for you! If your use case needs to be more exact, there are other solutions, too. Just thought I'd suggest the easy one first!
Since Tile has a group by function, your state and filenames will be more or less equally distributed across the five groups. Since it starts assigning values starting with group 1, group 1 will have the most unique values.
Hope this helps!
So 1st of all thank you for showing me the TILE tool. Looks like a pretty powerful tool and possibly the answer.
With that said its not quite doing what i need it do.
1st - If i have 1000 records - i need the field to have an even split between the 4 packages - so the result would be an 250,250,250,250.
2nd - The 1st priority field [FILENAME] would be evenly split between those packages. So for every [FILENAME] its split up evenly within those packages. So if Filename 1 has 100 records - it will split up Filename 1 into 25,25,25,25
3rd - The 2nd priority field [STATE] would be evenly split between those packages. So for every [STATE] its split up evenly within those packages. So if State (CA) has 100 records - it will split up CA into 25,25,25,25.
As an example Visual you can see how the data below was split into 4 packages - A,B,C,D.
FILENAME | STATE | PACKAGE |
0185 | CA | A |
0185 | DC | B |
0185 | MD | C |
0185 | NM | D |
0185 | VA | A |
0200 | CA | B |
0200 | DC | C |
0200 | MD | D |
0200 | NM | A |
0200 | VA | B |
0411 | CA | C |
0411 | DC | D |
0411 | MD | A |
0411 | NM | B |
0411 | VA | C |
0936 | CA | D |
0936 | DC | A |
0936 | MD | B |
0936 | NM | C |
0936 | VA | D |
If we analyze the data above our packages are evenly split:
PACKAGE | Count |
A | 5 |
B | 5 |
C | 5 |
D | 5 |
If we continue to analyze the data - FILENAME(our 1st Priority in the Nth) - is split up within those packages.
So FILENAME=0185 had 5 records - it split them up into each package with a small deviance.
FILENAME | PACKAGE | Count |
0185 | A | 2 |
0185 | B | 1 |
0185 | C | 1 |
0185 | D | 1 |
0200 | A | 1 |
0200 | B | 2 |
0200 | C | 1 |
0200 | D | 1 |
0411 | A | 1 |
0411 | B | 1 |
0411 | C | 2 |
0411 | D | 1 |
0936 | A | 1 |
0936 | B | 1 |
0936 | C | 1 |
0936 | D | 2 |
And finally if we dig deeper into [STATE] - our 2nd priority field - we can see how our data is split up into those packages.
So we had 4 CA records - each one got a Package.
STATE | PACKAGE | Count |
CA | A | 1 |
CA | B | 1 |
CA | C | 1 |
CA | D | 1 |
DC | A | 1 |
DC | B | 1 |
DC | C | 1 |
DC | D | 1 |
MD | A | 1 |
MD | B | 1 |
MD | C | 1 |
MD | D | 1 |
NM | A | 1 |
NM | B | 1 |
NM | C | 1 |
NM | D | 1 |
VA | A | 1 |
VA | B | 1 |
VA | C | 1 |
VA | D | 1 |
Sorry for all the writing and hope this makes sense. If not i can try and re-word it differently.
Javier
Hi Javier,
Thank you for being so clear in your desired outcome! The documentation you provided in your second workflow was very helpful as well. I believe several additional tools are needed due to the specificity of your request. I hope the results of this are exact enough to suit your use case! Happy to revise if necessary.
Yep worked! Thanks so much!
Javier