Hi everyone,
I hope all of you are having a good start of the year.
I was wondering that this might be actually simple but I don't know how to implement that. Here is the example dataset:
| ADV |
| ADV Smithson's |
| Bravo |
| Bravo Projects |
| Gamma group |
| Gamma Solutions s.r.l |
Now there are only 3 companies in this field (ADV, Bravo and Gamma). Idea is to merge this field into three rows only (3 companies).
- I know atleast for the first 4 rows. I could do something like string matching and if string of row matches row+1 (something like fuzzy match with a threshold or matching the first word without spaces), then the following row+1 can be replaced with the row string. Any idea how to do this?
- For the last two rows, I am not sure if that is even possible to handle such cases. I tried removing words such as "group" "s.r.l" and many more but this takes forever and lot of visual inspection of field which has more than 10,000 companies.
A help will be much appreciated.