Creating a Density Column Based on Thresholds of Item Counts
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello,
I am trying to create a density column based on another column. Specifically, if the count of a current item exceeds a certain threshold, then the corresponding value in the density column will be designated as "high"; otherwise, it will be designated as "low".
For example,
given col1, col2 (Density) , threshold = 2
Col 1 | Col 2 (Density) |
A | High |
A | High |
A | High |
B | Low |
C | Low |
Can I use summarize in formula tool to get the count?
Thanks
Solved! Go to Solution.
- Labels:
- Parse
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @Hagar-Usama, you can use the Summarize to group by [Col 1] and perform a count. We then just write an expression to assign the 'High' label if this is above the threshold of 2. After that, we can just join the labels back to the correct Col 1 value:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @DataNath,
Thanks you, works fine!
I'm wondering if it's possible to utilize the total count (original input size) in order to set the threshold as a percentage, rather than a fixed number
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @Hagar-Usama, yep that's also possible! Here we'd want to use a second Summarize tool, with no grouping and just take an overall Count. We would then append this to the data set (post-join), so that this overall count is present in every row. From there, we can again just write a simple Formula expression for if <Grouped Count>/<Total Count> is above the threshold then assign 'High'. I've currently set the percentage to 50% (0.5), but you can just adjust this as you see fit.
