Hi All,
I have data with three columns, a,b, and c, based on the columns a, and b I wanted populate data into the missing values in the column c, the method I am basically looking for is Fuzzy match, attached a sample of data however I have to replicate the same to the original data which I am working, It will be helpful If I get any leads to progress further.
a | b | c |
Pension ER A Sec | Pension ER A Sec 20% | 605 |
Period Base Sal BT | Period Base Sal BT | 704 |
myShare Plan | myShare Plan | 508 |
Transportation Allowance | Transportation Allowance | 227 |
Hypo Tax (Salary) | Hypo Tax (Salary) | 1001 |
Amount paid | Amount paid | 1025 |
Car assistance - cash allowance | Car assistance - cash allowance | 202 |
Hypo NI (Salary) | Hypo NI (Salary) | 908 |
Housing Allowance | Housing Allowance | 404 |
Pens Supp Allwnce | Pens Supp Allwnce | 203 |
Home Allowance | Home Allowance | |
Assignment Allowance GRS | Assignment Allowance GRS | |
Sal Supp Allowance GRS | Sal Supp Allowance GRS | |
Accident Ins | Accident Ins | |
Relocation Allowance GRS | Relocation Allowance GRS | |
Bonus | Bonus | |
Total | Total | |
Sueldo | Sueldo | |
RET. TAX INCOME | RET. TAX INCOME | |
Plus vacacional | Plus vacacional |
hi @vinayvnn Can I ask something?
1) I can imagine Fuzzy Match between column 'a' and 'b', but how column 'd' will be used?
2) How do you determine the value of column 'c'? Elaborate the logic, if any.
Thanks for responding,
Basically I have a data of 5,00,000 rows in which there are 6,800 rows of blank.
Column C already have data, where as it is left with 6800 blanks this needs to be filled based on the combination of .column a and b not with d (Apologies) for the confusion.
If any of the combinations of column a and b had populated data in column c
and same combination is having blank on the 6800 rows that should be filled based on the above criteria
@vinayvnn *Please try to mention so we can notice your response
If you want to find the same combination of column 'a' and 'b', and append to blank value of 'c' for that combination,
1) Split data by Filter tool; One is data having the value of 'c', another not having the value of 'c'.
2) JOIN by column 'a' and 'b' for the above two data (refer to configuration in the snapshot)
3) Combine data by Union tool
@gawa Thanks for the quick response, however the mentioned technique have not populated all the blank values still there are few of rows with blank values, approx (4000 rows) any alternative approach would you suggest here please ? it need not to be the same as column a and b and then populate blanks on the column c.
80% of match would be acceptable