Finding common phrases from a field of text

Question

Hi Community,

I'm trying to identify a list of phrases from a field that is only text. See below for example data set.

ID

Response1

Acid tests to determine pH level2

Advertising opportunities to capitalize on market3Capital Expenditures to promote growth

4Projections for upcoming year

5Marketing tasks

Basically, I would like to take the "response" field above, and identify/match phrases from a running list of common phrases/words (ideally located in another file). For example, the running list would look like the below:

Common PhrasesAcid TestAdvertisingCapital ExpenditureProjection

Ideally, I would like to have my final output be as follows so that I can summarize and determine how many of the same phrase occurred in the dataset:

IDResponseIdentified Phrases/Words1

Acid tests to determine pH levelAcid Test2Advertising opportunities to capitalize on marketAdvertising3Capital Expenditures to promote growthCapital Expenditure4Projections for upcoming yearProjection

5Marketing TasksNull

I've tried several tools (i.e. find & replace, etc) but many seem to only work on exact matches and I need some level of non-exact matching to account for singular vs. plural words, etc.

Let me know if you can help!

WillTravelForData · Accepted Answer

This post may get you started join files by partial field similarities