I am working with 2 datasets one with a unique identifier and the other with one, and I combined them to check if there is an identifier that has similarities or if they begin with the same number. so I tried to check if they are valid with regex but couldn't get to know how exactly it should be like(A11.11) after this step I should look for a similar ID and give them a similarity score.
after that, I will do the same operation for another column.
note: the size of the sets is different.
ICD10 and the code_id are the identifiers.
Hi @HishamIbrahim if you do a join on the two data sets on the first column that you want to check against you will be able to see the L, the J, and the R outputs which will show which values match and which don't. You can then continue looking at the unmatched data in a join following the first one using the second column or additional criteria.
Unfortunately, without data it is difficult to show this in practice. If you could mock up some fake ICD 10 codes and explain the logic that you would like to use then we could mock up a workflow.
User | Count |
---|---|
19 | |
15 | |
15 | |
9 | |
8 |