Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Matching a row string with row+1 and replacing row+1 string with the row string

danial_j
8 - Asteroid

Hi everyone,

 

I hope all of you are having a good start of the year.

 

I was wondering that this might be actually simple but I don't know how to implement that. Here is the example dataset:

 

ADV
ADV Tombstone
Bravo
Bravo Tech
Gamma group
Gamma Industries Co


In real data there are thousands of them but I am showing here only a tiny set.

 

In the example above there are only 3 companies in this field (ADV, Bravo and Gamma). Idea is to merge this field into three rows only (3 companies, removing duplicates).

 

- I know atleast for the first 4 rows. I could do something like string matching and if string of row matches row+1 (something like fuzzy match with a threshold or matching the first word without spaces), then the following row+1 can be replaced with the row string. Any idea how to do this?

 

- For the last two rows, I am not sure if that is even possible to handle such cases. I tried removing words such as "group" "co" and many more but this takes forever and lot of visual inspection of field which has more than 10,000 companies.

 

A help will be much appreciated.

4 REPLIES 4
Emil_Kos
17 - Castor
17 - Castor

Hi @danial_j,


I am not sure if I understand your requirement correctly. Do you want to have only 3 fields as an output?

 

Emil_Kos_1-1610013067634.png

 

If that is not the case can you show the desired output?

 

danial_j
8 - Asteroid

Hi @Emil_Kos Thank you for sharing but the problem is that I don't have a list of companies which could be a groudtruth. In my example I showed only glimpse of data. In reality there are thousands of companies so I can't search and replace because I don't know the names of all the companies that I could type in a text and use it as a reference. 

 

My idea was something like row comparison with the following row and replace the following row with the previous one if the first word matches.

Emil_Kos
17 - Castor
17 - Castor

Hi @danial_j,

 

So this is interesting. 

 

I used a combination of get word function and multirow formula to make it work.

 

I also made an assumption that if we have 1 word in the company name that is the correct company name that we should use. 

 

 

This is my output:

 

Emil_Kos_0-1610014412152.png

 

Please let me know what do you think. 

danial_j
8 - Asteroid

Oh this is what I was looking for. Thank you 🙂

Labels
Top Solution Authors