Hello,
I am wondering about the logic of the Find Replace Tool. I use the tool to append a column to my dataset --> as a vlookup
In my left input anchor I have many Worktask IDs.
In my right input anchor I have Worktask IDs with equipment numbers. One worktask ID has several equipment numbers.
I only want to append the lowest Equipment Number to my Worktask.
I sorted my Equipment Number Ascending. What I found out is that the Find Replace Tool takes the highest Equipment Number.
When I sort descending, the Find Replace Tool takes the lowest Equipment Number.
I checked it for some cases but not for my entire dataset (more than 5 million records).
Does the tool always select the last found record? Is this consistent?
I want to avoid that a random Equipment Number will be appended.
Solutions with other tools are welcome but I like to receive an answer specifically to the logic of the Find Replace Tool. I could not find anything online.
Thank you a lot!
Sophie
Hi @socathe, seems like Find Replace does always choose the final record, I can't tell you why and I also can't find it mentioned online. You can probably just sort descending and use the find and replace like you suggest, or if you want to be extra sure you can sort descending and then use the Sample tool. Set it to the first 1 row and group by your Worktask.
Hope this helps!
Thank you @FinnCharlton for your quick response :) I will probably be extra sure and use the sample tool to select the lowest number
Hi, @socathe
Yes, the Find and Replace tool will always return the first match.
For your use case, if I understand it correctly, you might consider using a Sample Tool after the Sort Tool. The Sample Tool will allow you to select the First or Last N=1 record by your desired group (Equipment ID?)
This way, Find and Replace will always source from only your sample.
Kindly consider attaching a sample with your desired output; the Community will help you. 👍
As info - you mentioned millions of rows. Find and Replace is likely optimal for ~100K rows. Beyond that, we might offer you an approach that uses joins instead. We might also use an Auto Field tool in the flow to reduce optimize the field sizes as it seemed the flow is Text heavy, i.e., it burns a lot of memory 😎.
I hope you find this helpful - Cheers!
Hi @RobertOdera, thank you for your answer. Do you mean that the Find and Replace tool will always return the last instead of the first match?
I changed to the sample and join tool now. I sort ascending and select the First N =1 grouped by worktask ID.
Why is the Find and Replace tool only optimal until ~ 100K rows?
Thanks for your help :)
Hi, @socathe
I believe that the Find and Replace will always return the first match.
Hack = Sort Tool + Sample Tool before the Find and Replace Tool allows you to determine whether the first matched instance (First, N=1) or the last matched instance (Last, N=1) will be passed along to the Find and Replace tool.
Regarding how optimal the Find and Replace Tool is...that is a hold-over thought from memory from a use case I had in the past.
However, check out the thoughts below...as the right answer might be it "depends" (AND I also may have been wrong in my thinking!). Maybe turn on performance profiling and see how long each setup takes?
Below is a point of view per @MarqueeCrew Solved: Re: Find and Replace is more performant than Join ... - Alteryx Community