Hi Community,
I'm trying to identify a list of phrases from a field that is only text. See below for example data set.
ID | Response |
1 | Acid tests to determine pH level |
2 | Advertising opportunities to capitalize on market |
| 3 | Capital Expenditures to promote growth |
| 4 | Projections for upcoming year |
| 5 | Marketing tasks |
Basically, I would like to take the "response" field above, and identify/match phrases from a running list of common phrases/words (ideally located in another file). For example, the running list would look like the below:
| Common Phrases |
| Acid Test |
| Advertising |
| Capital Expenditure |
| Projection |
Ideally, I would like to have my final output be as follows so that I can summarize and determine how many of the same phrase occurred in the dataset:
| ID | Response | Identified Phrases/Words |
1 | Acid tests to determine pH level | Acid Test |
| 2 | Advertising opportunities to capitalize on market | Advertising |
| 3 | Capital Expenditures to promote growth | Capital Expenditure |
| 4 | Projections for upcoming year | Projection |
| 5 | Marketing Tasks | Null |
I've tried several tools (i.e. find & replace, etc) but many seem to only work on exact matches and I need some level of non-exact matching to account for singular vs. plural words, etc.
Let me know if you can help!