Let's say I have this data set (and the padding of the second record is intentional):
Field_1 |
The quick fox |
fox jumps over the lazy brown dog |
What I would like to do is output the following:
Field_1 | Merged Field | Overlap |
The quick fox | The quick fox jumps over the lazy brown dog | fox |
fox jumps over the lazy brown dog |
Basically, I need to merge the strings together.
The use case could potentially expand to something like this:
Field_1 | Merged Field | Overlap |
The quick fox | The quick fox jumps over the lazy brown dog | fox jumps over the lazy |
fox jumps over | ||
over the lazy | ||
the lazy brown dog |
In other words, there is no way of knowing how many characters the strings may overlap by.
Assume there is some way in the data set to identify the set of records that need to be merged together. Also assume that I don't really care which record the [Merged Field] or [Overlap] gets printed to, as long as its a record within the set of strings that need to be merged together. The padding of the string in the [Overlap] column needs to be there - the number of spaces in the padding represents the position of the overlap.
In C# one might use the string zip functionality to achieve this; I'm at a loss as to how to accomplish this in Alteryx. This is not a simple concatenation - it's a true merge.
Solved! Go to Solution.
Hi @smille17,
On approach is to tokenize the characters and then do sum summaries and so forth; there may be something involving fewer tools than I've used in my example, but it does at least work. Hope that helps.
Thanks for this - interesting approach. I wonder how it scales with a large number of records. This definitely helps me move forward, so much appreciated!