Alteryx Designer Desktop Discussions

morecoffeenow · ‎03-03-2022

I have 2 data sets that I am trying to compare 1 to 1, but this is split across 3 columns and the way the fields are concatenated is difference.

Data set 1 has 1 column with a concatenation of String 1, String 2, and String 3, while Dataset 2 has one column containing a concatenation of String 1 and String 3 and another column with String 2.

Data set 1

Column 1: String 1 + String 2 + String 3

Data set 2

Column 2: String 1 + String 3

Column 3: String 2

There is no pattern of what type of data could be in each string - it is not limited to words, addresses, IDs, etc and none of the fields have a predictable or set length. I am wondering if it is possible to first compare the left and right of Column 1 to Column 2, and then take the area in the middle that stopped matching from Column 1 to Column 2 to then compare to Column 3. We are looking to not use fuzzy match if at all possible.

Example:

Data set 1

Column 1: UX4759d452regcustdelivery

Data set 2

Column 2: UX4759regcustdelivery

Column 3: d452

Thank you very much in advance for your consideration.

gabrielvilella · ‎03-03-2022

Let's say you take the left and right of both columns and want to compare them. How are you going to determine the amount of character from left and right if each record can have a different amount?

Another option would be if you already have a table with all the IDs, then you could use find and replace to check against that table.

danilang · ‎03-06-2022

Hi @morecoffeenow

Here's one way that you can do it

The key formula is this

if findstring([Column1],[Column3])>=0 then
	Left([Column2],findstring([Column1],[Column3]))+[Column3]+Substring([Column2],findstring([Column1],[Column3]))
else
	""
endif

If column 3 is contained in column 1, then take the left part of column 1 to where column 3 starts, add column 3 and then add the rest of column 1

This works for the base case where column 2 is the one that has the piece missing. You may have to modify it to work in edge cases.

Dan

morecoffeenow · ‎03-08-2022

Hi Dan,

Thanks so much for this. This took care of most of the data, but there is a bit remaining that falls under a slightly different pattern where the middle string of Column 1 is not a direct match to Column 3. However, if there was a way to isolate the beginning and end of Column 1 (UX4759 and regcustdelivery) and see if the remaining d452 from Column 1 is contained within Column 3, then that would suffice as a match for my purposes.

I’m thinking the logic might be quite different since you can’t use findstring to search for Column 3 in Column 1 directly, but any ideas?

Data set 1

Column 1: UX4759d452regcustdelivery

Data set 2

Column 2: UX4759regcustdelivery

Column 3: 123d452xyz

If we can see that d452 from Column 1 is present within Column 3 (using a contains?) then that would be enough.

Thanks again very much for your help.

danilang · ‎03-08-2022

Hi @morecoffeenow

Here's a way that you can accomplish the second part

For each of column 1 and 2, split the string to letters and join on record id and position. Use a Multi-row to find where the strings stop matching and the number of matching letters is the length of the left hand match. Do the same after reversing the strings to find the length of right hand match. Join these back up to you original input and use

Substring([Column 1],[LeftSide],length([Column 1])-([LeftSide]+[RightSide]))

to pull out the middle piece. Look for this middle in column 3

Dan

morecoffeenow · ‎03-10-2022

Hi Dan,

Thank you so much for this - this is perfect. I marked your first answer as the solution since that was the fit for the original question, but this is also unbelievably helpful. I really appreciate you taking the time to make these. This is also my first time seeing Regex Tokenize, something I clearly need to work on. Thank you again!

Alteryx Designer Desktop Discussions

Compare columns with partially overlapping concatenated strings

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data

Re: Row creation