Hi everyone,
I hope all of you are having a good start of the year.
I was wondering that this might be actually simple but I don't know how to implement that. Here is the example dataset:
| ADV |
| ADV Tombstone |
| Bravo |
| Bravo Tech |
| Gamma group |
| Gamma Industries Co |
In real data there are thousands of them but I am showing here only a tiny set.
In the example above there are only 3 companies in this field (ADV, Bravo and Gamma). Idea is to merge this field into three rows only (3 companies, removing duplicates).
- I know atleast for the first 4 rows. I could do something like string matching and if string of row matches row+1 (something like fuzzy match with a threshold or matching the first word without spaces), then the following row+1 can be replaced with the row string. Any idea how to do this?
- For the last two rows, I am not sure if that is even possible to handle such cases. I tried removing words such as "group" "co" and many more but this takes forever and lot of visual inspection of field which has more than 10,000 companies.
A help will be much appreciated.