I am working with 2 datasets one with a unique identifier and the other with one, and I combined them to check if there is an identifier that has similarities or if they begin with the same number. so I tried to check if they are valid with regex but couldn't get to know how exactly it should be like(A11.11) after this step I should look for a similar ID and give them a similarity score.
after that, I will do the same operation for another column.
note: the size of the sets is different.
ICD10 and the code_id are the identifiers.