Hello,
I am trying to create a density column based on another column. Specifically, if the count of a current item exceeds a certain threshold, then the corresponding value in the density column will be designated as "high"; otherwise, it will be designated as "low".
For example,
given col1, col2 (Density) , threshold = 2
Col 1 | Col 2 (Density) |
A | High |
A | High |
A | High |
B | Low |
C | Low |
Can I use summarize in formula tool to get the count?
Thanks
Solved! Go to Solution.
Hey @Hagar-Usama, you can use the Summarize to group by [Col 1] and perform a count. We then just write an expression to assign the 'High' label if this is above the threshold of 2. After that, we can just join the labels back to the correct Col 1 value:
Hey @DataNath,
Thanks you, works fine!
I'm wondering if it's possible to utilize the total count (original input size) in order to set the threshold as a percentage, rather than a fixed number
Hey @Hagar-Usama, yep that's also possible! Here we'd want to use a second Summarize tool, with no grouping and just take an overall Count. We would then append this to the data set (post-join), so that this overall count is present in every row. From there, we can again just write a simple Formula expression for if <Grouped Count>/<Total Count> is above the threshold then assign 'High'. I've currently set the percentage to 50% (0.5), but you can just adjust this as you see fit.