How do I get the top 10% of a dataset
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Would I use the sample tool to get the top 10% of people? I have been trying to get the top 10% of a category but I am not too sure.
Solved! Go to Solution.
- Labels:
- Workflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @xspunky
Yes, you can use the below Sample tool with the highlighted setting to get top/first 10% of the dataset.
If you believe your problem has been resolved. Please mark helpful answers as a solution so that future users with the same problem can find them more easily!!!!
Many thanks
Shanker V
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
is there another way to get the top 10% of a set? I think my results aren't accurate enough.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @xspunky
I am not sure why it was not accurate. I have used it many times and it works fine.
Could you please share some screenshots or any information to deep dive on the issue.
However proposing another way below.
Input was:
Many thanks
Shanker V
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @xspunky
To explain in detail, I have used Tile tool with below setting.
I have used number of times = 10 because you need 10% from the data.
Then I did filter on Tile Num = 1 which will get the result of first 10percent of data.
Hope it helps!!!
Many thanks
Shanker V
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @xspunky, the sample approach is absolutely right here. However, as you're selecting the top 10% based on the value within the records, rather than purely just the first 10% of records, you'll first need to Sort your data based on the relevant field (in your case, it'll be the column representing restaurant spend, sorted descending i.e. highest to lowest). Now when you take the top 10%, it'll be in order of spend. Here's a quick example where I've generated 10k rows, assigned them all a random value up to 50,000 and then sorted/isolated the top 1k (10%):
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
thank you so much, this was the exact thing I was looking for. I did everything correct, except sorting the data before the sample tool of 10%.
