This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I need suggestions on a data preparation task that I am working on.
The dataset has identical values for few columns for multiple records. I need to modify this dataset in 3 ways depending on these columns
1st way - Identify records which have similar values for the columns and then based on these values check if a value in a numerical column for any of these records is lesser than a fixed value, If one of the records satisfies this condition, then all the other records which have similar values for these columns should also be marked with the same flag as the record which satisfies the condition.
2nd way - Same as 1st way + 1 more condition. Check if there are 3 records in this dataset which also satisfy an additional condition - The sum total of the value in the numerical column for these 3 records should be less than a certain value. If this condition is satisfied, then mark all the records with the same flag.
3rd way - Same as 1st way + 1 more condition. Check if there are 3 records in this dataset which also satisfy an additional condition - The sum total of the value in the numerical column for these 3 records should be less than a certain value and also the sum total of any of the 2 records from these 3 records should be more than 15% of the sum of the values in these 3 records.
It would be really helpful if anybody could take a look these cases and provide few suggestions on these scenarios.
I have used Sort to arrange the dataset in the descending order of the columns's value that is required for comparisons. However, the number of records that I want to check would vary with the combination of the columns.
Pasted above is a sample dataset that I have created. The first four columns contain identical values across the records and the 5th column contains a numerical value which has to be used for comparisons.