Question on how Find Replace tool works when both "R" and "F" streams have some duplicates
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have two datastreams. First looks like this:
Apple
Apple
Peach
Watermelon
Banana
Apple
Apple
The second looks like
Apple
Peach
Apple
Banana
Watermelon
My point being.. The first dataset is smaller, the second is bigger. I want to append ALL of the items from the first set, to the second. But some of the items in the first set are identical. Will find replace only take the first record of apple from set 1, to append next to both records of apples in set 2? Or once a record has been matched it looks for the next record to match, even if the first record could theoretically match too? The behavior I need is the latter.
Basically even though there are "duplicates" in set 1 and set 2, I want the find/replace to treat each repeated record as unique. I know a simple solution would be to append a unique identifier to each record that is repeated (in each stream) before the "find replace", but I'm not sure how. Any ideas? Do I simply use the unique tool and then add an incremental number to "D" output of the tool and then append them again? So it would look like Apple Apple1 Apple2 Apple3?
Basically the join/replace tool I'm looking for should make the result should look like this:
Apple Apple (1st apple from Set 1)
Peach Peach (Peach from Set 1)
Apple Apple (2nd apple from Set 1)
Banana Banana (Banana from Set 1)
Watermelon Watermelon (Watermelon from Set 1)
The last 2 apples from Set 1 should remain unmatched.
Solved! Go to Solution.
- Labels:
- Workflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This one..
Will find replace only take the first record of apple from set 1, to append next to both records of apples in set 2
Imagine F&R as a vlookup
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@LordNeilLord Thanks. So the solution is to in fact add a unique identifier to each repeating record before going into Find Replace.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
If I understand the result you are trying to get correctly, then a join may be what you are after.
For each Apple in your second dataset you would end up with 4 records when you join on Apple. If you then need to replace a different value in the dataset, you can either do that in the join tool through the select part or use a formula afterwards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@Lbunce The result needs to look like this:
Apple Apple (1st apple from Set 1)
Peach Peach (Peach from Set 1)
Apple Apple (2nd apple from Set 1)
Banana Banana (Banana from Set 1)
Watermelon Watermelon (Watermelon from Set 1)
The last 2 apples from Set 1 should remain unmatched.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
In that case you need to use identifiers as you mentioned before. I don't think just adding ID's is enough as I assume you would want this to scale when you add more data.
I've attached a workflow which I believe will solve the problem, it sorts both datasets into the order in which the item like Apple appears and then adds a number to say which occurrence of that word it is and joins based on that, after the join you can remove fields that aren't needed and change things but it gives you the result you need. This should also work when you add more rows into either dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
hi @AkisM
You might want to try something like this
Inputs 1 and 2 contain the data from your first post. For both of these, add a record ID(InputxID) and then sort by item name and InputxID. The Multi-row tool adds an ItemOrder based on record position within each group, i.e. Apple 1, Apple 2, Apple x, Banana 1, Banana 2, etc. Join on Item name and ItemOrder, so Apple 1(R) is joined with Apple 1(L). Sort by Input 2, giving you
The 1st 3 columns are only the records from Input 1 that match the corresponding records in input 2 by position.
Edit: @Lbunce Hey look. Twins
Dan
