I have a dataset of hundreds of columns with thousands of rows and need to find the percentile of each value within each set. Normally to find the percentile of each value in a data set I would just sort the data, create a rank column, and divide the rank by the count of unique values. However, I do not want to create a workflow that does this for each of the hundreds of columns. Is there a way to loop the process or a tool that will do this?
Example Dataset Input:
ID | Col1 | Col2 | Col3 | Col4 |
11111 | 1 | 15 | 99 | 56 |
22222 | 6 | 33 | 55 | 105 |
33333 | 3 | 6 | 66 | 12 |
44444 | 8 | 99 | 4 | 156 |
55555 | 9 | 55 | 15 | 56 |
66666 | 7 | 3 | 88 | 8 |
77777 | 4 | 1 | 5 | 60 |
Desired Output:
ID | Col1 | Col2 | Col3 | Col4 |
11111 | 0% | 50% | 100% | 33% |
22222 | 50% | 67% | 50% | 83% |
33333 | 17% | 33% | 67% | 17% |
44444 | 83% | 100% | 0% | 100% |
55555 | 100% | 83% | 33% | 33% |
66666 | 67% | 17% | 83% | 0% |
77777 | 33% | 0% | 17% | 67% |
How can I do this for hundreds of columns? The above example is the inclusive percentile, but I am fine with finding the exclusive percentile as well.
Solved! Go to Solution.
Would something like this work? It's based on transposing your columns, doing the math, then putting the data back into columns.
I've never done this before, so there may be steps that you don't need, or can refine, but the output it generates is exactly what you asked for.
Edit: Also, I based my process on the information I found here:
https://community.alteryx.com/t5/Engine-Works/Alteryx-Percentile-Macro/ba-p/2347
The macro it references may be more quickly do what you need.
Good luck.
Thanks a ton Smoosh, I knew I was missing something. This makes sense and is exactly what I was looking for.