How to do Partial Text matching from two different data sets
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi All,
I have two data sets and I would like to perform partial text matching.
E.g.
Data Set 1 Contains large number of records one of the columns like
- Congratulation
- Atul Deshpande
- Female
- Chocolate
Data Set 2 contains like - (which contains around 100 records)
- Atul
- Male
- Late
In output result I want to see as
- Atul - appear 2 times in data set 1
- Male - appear 1 time in data set 1
- Late - appear 1 time in data set 1
Kindly guide me how to do this
Regards
Atul Deshpande
- Labels:
- Text Mining
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@atuldeshpande1
One of the way will be to use a batch macro
You Data Set 1 will go into the Batch Macro as Macro Input and Data Set 2 will be the Control Parameter. Inside the Batch Macro you will have a Filter tool that is set to Contains and then set the [#1] to indicate the Control Parameter.
Then you can use Count tool and Summarize tool to count the number of entries. Add a Formula tool and set the needed text to get your output and then connect it to Macro Output Tool and you get the desired result.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@atuldeshpande1 I think a simple Find Replace will do the trick, since you can match to Any Part of Field. You can toggle on case insensitivity, and use the append option to get a column of the matches. Then you can use a Summarize to count the records by keyword :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I'm not sure how relevant it may be to this use case, but the Find and Replace Tool (while definitely being the simplest approach) will only identify the first matching record. If a record has multiple search terms in it, the output will be incorrect.
If it is true that you will only ever have a single match for each search term, then I highly recommend @alexnajm 's solution. Otherwise, you will need to use the Append Fields Tool and Filter on Contains(), or use the Batch macro approach described above.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Based on the output above and the non-specification of multiple matches, that’s my assumption / hope @CoG :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I loved your approach
