summarize percentile function not working as expected
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am attempting to group customer sales by category using percentile to get a normal distribution. I cannot use a std dev because of the nature of the data the cluster toward the low end and the high variance would have -1 std dev from the median below 0. So I settled on running it through a percentile summarize and taking the 2.5, 13.5, 34, 34, 13.5, 2.5 as the thresholds for my groups to get an even distribution. But when i perform the percentile function it returns the raw value of sales, not a percentile ranking. has anyone ran into this. The percentile macro does not work for me because i believe it is std dev based and in some uncommon categories my lowest value will be 34 and highest will be 96. i attached a sample data set of what my expected/desired result is vs what the tool is spitting out.I just made up some data to keep it simple and for privacy reasons
Student | Class | rawscore | Percentile_rawscore | expected result | |
azmodius | history | 150 | 150 | 50 | 50th |
brian | history | 85 | 85 | 0 | 0th |
britney | history | 184 | 184 | 80 | 80th |
casey | history | 155 | 155 | 60 | 60th |
helga | history | 86 | 86 | 10 | 10th |
jaime | history | 96 | 96 | 20 | 20th |
james | history | 125 | 125 | 30 | 30th |
jenny | history | 130 | 130 | 40 | 40th |
kelly | history | 165 | 165 | 70 | 70th |
kristen | history | 188 | 188 | 90 | 90th |
azmodius | math | 421 | 421 | 90 | 90th |
brian | math | 158 | 158 | 10 | 10th |
britney | math | 255 | 255 | 60 | 60th |
casey | math | 285 | 285 | 70 | 70th |
helga | math | 248 | 248 | 40 | 40th |
jaime | math | 235 | 235 | 30 | 30th |
james | math | 254 | 254 | 50 | 50th |
jenny | math | 288 | 288 | 80 | 80th |
kelly | math | 145 | 145 | 0 | 0th |
kristen | math | 210 | 210 | 20 | 20th |
azmodius | pe | 145 | 145 | 0 | 0th |
brian | pe | 258 | 258 | 40 | 40th |
britney | pe | 231 | 231 | 10 | 10th |
casey | pe | 247 | 247 | 20 | 20th |
helga | pe | 955 | 955 | 90 | 90th |
jaime | pe | 752 | 752 | 80 | 80th |
james | pe | 251 | 251 | 30 | 30th |
jenny | pe | 265 | 265 | 50 | 50th |
kelly | pe | 596 | 596 | 70 | 70th |
kristen | pe | 456 | 456 | 60 | 60th |
Solved! Go to Solution.
- Labels:
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@BFritter How about the attached.
I used the sort tool to order the records by score. Then the multi-row formula ranks them within each class. Then the summarize gets the max rank per class. The find replace then adds the Max rank back to each class. The formula tool gets the percentile and multiplies it by 100.
Hopefully that's the direction you were looking to go!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks, this gets me where i wanted to go. I don't get why the percentile function doesn't do this though.
