Hi everyone,
I hope all of you are having a good start of the year.
I was wondering that this might be actually simple but I don't know how to implement that. Here is the example dataset:
ADV |
ADV Tombstone |
Bravo |
Bravo Tech |
Gamma group |
Gamma Industries Co |
In real data there are thousands of them but I am showing here only a tiny set.
In the example above there are only 3 companies in this field (ADV, Bravo and Gamma). Idea is to merge this field into three rows only (3 companies, removing duplicates).
- I know atleast for the first 4 rows. I could do something like string matching and if string of row matches row+1 (something like fuzzy match with a threshold or matching the first word without spaces), then the following row+1 can be replaced with the row string. Any idea how to do this?
- For the last two rows, I am not sure if that is even possible to handle such cases. I tried removing words such as "group" "co" and many more but this takes forever and lot of visual inspection of field which has more than 10,000 companies.
A help will be much appreciated.
Solved! Go to Solution.
Hi @danial_j,
I am not sure if I understand your requirement correctly. Do you want to have only 3 fields as an output?
If that is not the case can you show the desired output?
Hi @Emil_Kos Thank you for sharing but the problem is that I don't have a list of companies which could be a groudtruth. In my example I showed only glimpse of data. In reality there are thousands of companies so I can't search and replace because I don't know the names of all the companies that I could type in a text and use it as a reference.
My idea was something like row comparison with the following row and replace the following row with the previous one if the first word matches.
Hi @danial_j,
So this is interesting.
I used a combination of get word function and multirow formula to make it work.
I also made an assumption that if we have 1 word in the company name that is the correct company name that we should use.
This is my output:
Please let me know what do you think.
Oh this is what I was looking for. Thank you 🙂
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
5 |