Hi Guys, I have a dataset where I have to calculate the percentage of occurance
Example: Input
Product Id | Apple | Mango | Banana |
123 | Yes | ||
123 | Yes | Yes | |
245 | Yes | ||
245 | Yes | Yes | |
245 | Yes | Yes | |
456 | |||
456 | Yes | Yes | |
456 | Yes |
Rules:
1. If the percentage is greater than 80% then put the column name should be in the comment and if it is lesser than 80% then put
two largest percentages example: 456 <- product id
Example:
1. product id 123 for apple we have two yeses in the apple which is 100% because both the rows have yes and the rest does not so put a comment as an apple.
2. Product id 456 apple is 66.66% and mango is 33.33%. So both are less than 80% so put both of them in comment
Output:
Product Id | Apple | Mango | Banana | Comment |
123 | 100% | 50% | Apple | |
245 | 100% | 66.66% | Mango | |
456 | 66.66% | 33.33% | Apple, Mango |