Hi everyone,
I hope all of you are having a good start of the year.
I was wondering that this might be actually simple but I don't know how to implement that. Here is the example dataset:
ADV |
ADV Smithson's |
Bravo |
Bravo Projects |
Gamma group |
Gamma Solutions s.r.l |
Now there are only 3 companies in this field (ADV, Bravo and Gamma). Idea is to merge this field into three rows only (3 companies).
- I know atleast for the first 4 rows. I could do something like string matching and if string of row matches row+1 (something like fuzzy match with a threshold or matching the first word without spaces), then the following row+1 can be replaced with the row string. Any idea how to do this?
- For the last two rows, I am not sure if that is even possible to handle such cases. I tried removing words such as "group" "s.r.l" and many more but this takes forever and lot of visual inspection of field which has more than 10,000 companies.
A help will be much appreciated.
Fuzzy Match would probably be my first pick but if you know of any other pattern like for example you parse out the first word and use that to do a group by, would that do it for you?