Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Matching a row string with row+1 and replacing row+1 string with the row string

danial_j
8 - Asteroid

Hi everyone,

 

I hope all of you are having a good start of the year.

 

I was wondering that this might be actually simple but I don't know how to implement that. Here is the example dataset:

 

ADV
ADV Tombstone
Bravo
Bravo Tech
Gamma group
Gamma Industries Co


In real data there are thousands of them but I am showing here only a tiny set.

 

In the example above there are only 3 companies in this field (ADV, Bravo and Gamma). Idea is to merge this field into three rows only (3 companies, removing duplicates).

 

- I know atleast for the first 4 rows. I could do something like string matching and if string of row matches row+1 (something like fuzzy match with a threshold or matching the first word without spaces), then the following row+1 can be replaced with the row string. Any idea how to do this?

 

- For the last two rows, I am not sure if that is even possible to handle such cases. I tried removing words such as "group" "co" and many more but this takes forever and lot of visual inspection of field which has more than 10,000 companies.

 

A help will be much appreciated.

4 REPLIES 4
Emil_Kos
17 - Castor
17 - Castor

Hi @danial_j,


I am not sure if I understand your requirement correctly. Do you want to have only 3 fields as an output?

 

Emil_Kos_1-1610013067634.png

 

If that is not the case can you show the desired output?

 

danial_j
8 - Asteroid

Hi @Emil_Kos Thank you for sharing but the problem is that I don't have a list of companies which could be a groudtruth. In my example I showed only glimpse of data. In reality there are thousands of companies so I can't search and replace because I don't know the names of all the companies that I could type in a text and use it as a reference. 

 

My idea was something like row comparison with the following row and replace the following row with the previous one if the first word matches.

Emil_Kos
17 - Castor
17 - Castor

Hi @danial_j,

 

So this is interesting. 

 

I used a combination of get word function and multirow formula to make it work.

 

I also made an assumption that if we have 1 word in the company name that is the correct company name that we should use. 

 

 

This is my output:

 

Emil_Kos_0-1610014412152.png

 

Please let me know what do you think. 

danial_j
8 - Asteroid

Oh this is what I was looking for. Thank you 🙂

Labels
Top Solution Authors