Matching a row string with row+1 and replacing row+1 string with the row string

danial_j

Hi everyone,

I hope all of you are having a good start of the year.

I was wondering that this might be actually simple but I don't know how to implement that. Here is the example dataset:

ADV

ADV Tombstone

Bravo

Bravo Tech

Gamma group

Gamma Industries Co

In real data there are thousands of them but I am showing here only a tiny set.

In the example above there are only 3 companies in this field (ADV, Bravo and Gamma). Idea is to merge this field into three rows only (3 companies, removing duplicates).

- I know atleast for the first 4 rows. I could do something like string matching and if string of row matches row+1 (something like fuzzy match with a threshold or matching the first word without spaces), then the following row+1 can be replaced with the row string. Any idea how to do this?

- For the last two rows, I am not sure if that is even possible to handle such cases. I tried removing words such as "group" "co" and many more but this takes forever and lot of visual inspection of field which has more than 10,000 companies.

A help will be much appreciated.

Data Investigation

Accepted answers

Emil_Kos

Hi @danial_j,

So this is interesting.

I used a combination of get word function and multirow formula to make it work.

I also made an assumption that if we have 1 word in the company name that is the correct company name that we should use.

This is my output:

Please let me know what do you think.

Append 3.yxmd

All comments

Emil_Kos

Hi @danial_j,

I am not sure if I understand your requirement correctly. Do you want to have only 3 fields as an output?

If that is not the case can you show the desired output?

Append 2.yxmd

danial_j

Hi @Emil_Kos Thank you for sharing but the problem is that I don't have a list of companies which could be a groudtruth. In my example I showed only glimpse of data. In reality there are thousands of companies so I can't search and replace because I don't know the names of all the companies that I could type in a text and use it as a reference.

My idea was something like row comparison with the following row and replace the following row with the previous one if the first word matches.

Quick Links

This months top contributors

atcodedog05 19458

Qiu 15866

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703