Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Compare columns with partially overlapping concatenated strings

morecoffeenow
5 - Atom

I have 2 data sets that I am trying to compare 1 to 1, but this is split across 3 columns and the way the fields are concatenated is difference.

 

Data set 1 has 1 column with a concatenation of String 1, String 2, and String 3, while Dataset 2 has one column containing a concatenation of String 1 and String 3 and another column with String 2.

 

Data set 1

Column 1: String 1 + String 2 + String 3

 

Data set 2

Column 2: String 1 + String 3

Column 3: String 2

 

There is no pattern of what type of data could be in each string - it is not limited to words, addresses, IDs, etc and none of the fields have a predictable or set length. I am wondering if it is possible to first compare the left and right of Column 1 to Column 2, and then take the area in the middle that stopped matching from Column 1 to Column 2 to then compare to Column 3. We are looking to not use fuzzy match if at all possible.

 

Example:

 

Data set 1

Column 1: UX4759d452regcustdelivery 

 

Data set 2

Column 2: UX4759regcustdelivery

Column 3: d452

 

Thank you very much in advance for your consideration.

 

5 REPLIES 5
gabrielvilella
14 - Magnetar

Let's say you take the left and right of both columns and want to compare them. How are you going to determine the amount of character from left and right if each record can have a different amount? 

Another option would be if you already have a table with all the IDs, then you could use find and replace to check against that table. 

danilang
19 - Altair
19 - Altair

Hi @morecoffeenow 

 

Here's one way that you can do it

 

danilang_0-1646571818147.png

The key formula is this

if findstring([Column1],[Column3])>=0 then
	Left([Column2],findstring([Column1],[Column3]))+[Column3]+Substring([Column2],findstring([Column1],[Column3]))
else
	""
endif
	

 If column 3 is contained in column 1, then take the left part of column 1 to where column 3 starts, add column 3 and then add the rest of column 1

 

danilang_2-1646572842968.png

 

This works for the base case where column 2 is the one that has the piece missing.  You may have to modify it to work in edge cases.

 

Dan

 

 

morecoffeenow
5 - Atom

Hi Dan,

 

Thanks so much for this. This took care of most of the data, but there is a bit remaining that falls under a slightly different pattern where the middle string of Column 1 is not a direct match to Column 3. However, if there was a way to isolate the beginning and end of Column 1 (UX4759 and regcustdelivery) and see if the remaining d452 from Column 1 is contained within Column 3, then that would suffice as a match for my purposes.

 

I’m thinking the logic might be quite different since you can’t use findstring to search for Column 3 in Column 1 directly, but any ideas?

 

Data set 1

Column 1: UX4759d452regcustdelivery 

 

Data set 2

Column 2: UX4759regcustdelivery

Column 3: 123d452xyz

     If we can see that d452 from Column 1 is present within Column 3 (using a contains?) then that would be enough.

 

Thanks again very much for your help.

danilang
19 - Altair
19 - Altair

Hi @morecoffeenow 

 

Here's a way that you can accomplish the second part

 

danilang_0-1646766294220.png

 

For each of column 1 and 2, split the string to letters and join on record id and position.  Use a Multi-row to find where the strings stop matching and the number of matching letters is the length of the left hand match.  Do the same after reversing the strings to find the length of right hand match.  Join these back up to you original input and use 

Substring([Column 1],[LeftSide],length([Column 1])-([LeftSide]+[RightSide]))

to pull out the middle piece.  Look for this middle in column 3

 

Dan

 

morecoffeenow
5 - Atom

Hi Dan,

 

Thank you so much for this - this is perfect. I marked your first answer as the solution since that was the fit for the original question, but this is also unbelievably helpful. I really appreciate you taking the time to make these. This is also my first time seeing Regex Tokenize, something I clearly need to work on. Thank you again!

Labels
Top Solution Authors