I'm currently working on a dataset that contains defect descriptions for repairs, and I'd like to assign defect codes to each row based on these descriptions. I have a separate table with defect codes and corresponding descriptions that need to match as closely as possible with my repair data. any help would be appreciated!
Suggestion: use a Join tool, join by each word, count number of word matches, take the match with the highest word match count
input file "Defect description.xlsx":
Add a record ID tool (create a new field Record ID 1)
Identify specific cases where you should keep a slash, like "G/BOXES" and "U/J". Convert those slashes to another character, like maybe a tilde ~ (a rarely used character)
Convert all other slashes to a space
Use the Text to Columns tool to split on a space, split to Rows
input file "Defect codes.xlsx":
Add a record ID tool (create a new field Record ID 2)
Identify specific cases where you should keep a slash. Convert those slashes to another character, like maybe a tilde ~ (a rarely used character)
Convert all other slashes to a space
Use the Text to Columns tool to split on a space, split to Rows
Join from the first file to the second file.
Use the Summarize tool: group by Record Id 1 and Record Id 2, count the number of matches
Sort by Record Id 1 Ascending, number of matches Descending
Use a Sample tool to keep only the first record for each Record Id 1
Chris