Hi @MukulA,
Due to the way AMP processes data in chunks, my guess is that AMP is removing duplicates within each chunk but not from all the data together. You can test this theory by running a smaller portion of your dataset and see if that affects the unique tool's effectiveness within AMP (if my theory holds, the unique tool will still be ineffective).
As a solution to this, you could try replacing your unique tool with a summarize tool. This will put the data in the fewest number of groups regardless of AMP processing chunks.
I have already used summarize tool before unique tool to reduce size of data. but this is not working.
is there any other solution to eliminate duplicate lines from workflow or I have to switch the workflow back to original engine which is taking very long time to complete the workflow as compared with AMP engine.
thanks for you help!
Hi @MukulA
@clmc9601 is correct in that the AMP engine can cause different results with the unique tool. See the help page for details.
To get around this, try the following
It's based on using the Sample tool, which is supposed to be completely Amp compatible. The top branch take only the first record in every unique value and the bottom branch skips the first one, returning only the remaining duplicate records.
This method works on a large dataset(10M rows) that I created. but it runs approximately 50% as fast. Please try it out on yours and post the results here.
Dan
Thanks @danilang for your reply.
I have tried your solution of using sort & sample tool to eliminate duplicate rows and enabled AMP engine but this is again not giving correct result. it is providing same result just like unique tool in AMP engine.
do you have any other solution please..
Hello @MukulA
First, thank you for using the new AMP Engine.
I encourage you to continue to report any use case issues that you find with running workflows with AMP Engine enabled. We worked hard to identify differences from the original Engine as well as provide guidance on how to better optimize workflows to run with AMP.
Have you tried adding a Sort before your Unique tool so that the data is sorted in order by the values you are trying to de-dupe?
There are differences with AMP in the ordering of data out of certain tools. This as well as the packeting of data that was previously mentioned could be causing what you are seeing with large data sets.
Here is some information related to the new AMP Engine:
https://help.alteryx.com/current/designer/alteryx-amp-engine
https://help.alteryx.com/current/designer/Alteryx-Engine-and-AMP-Main-Differences
https://help.alteryx.com/current/designer/AMP-Memory-Use
https://help.alteryx.com/current/designer/tool-use-amp
https://community.alteryx.com/t5/Engine-Works/AMPlify-your-Workflows/ba-p/617590
AlterEverything Podcast: https://community.alteryx.com/t5/Alter-Everything-Podcast/66-The-Alteryx-AMP-Engine-Explained/ba-p/5...
Thanks @TonyaS for your reply!
I have already tried multiple options as mentioned below but none of them are working using AMP engine:-
1. Sort Tool before Unique
2. Sort toll and Sample tool (Take only 1 row option and group by all columns in by database)
refer workflow screenshot for your reference.
let me know in case you need any further details
Hi@TonyaS
I wanted to revamp this because AMP engine comes by default in new Alteryx versions.
Here's a small dataset that is able to generate the issue:
Input to the tool:
Output of the tool:
The problem in my case is when empty strings and nulls are not one after the other. It eliminates consecutive empty strings, and consecutive nulls, but when they are one after the other it does not get rid of them and mark as duplicate. Disabling AMP engine fixes the issue.
@Struck
Thank you for the reminder and for including a workflow. I do see the different number of records with AMP even using Engine Compatibility Mode (which should fix any record order differences with AMP). I will work with my colleagues to get a defect created for this issue.
@Struck I'm mostly impressed that you've got both nulls and empty cells in the same column.
Nevertheless, since Unique is failing you i.e:
Do you have any objection to using the Summarize tool and grouping by on every field like so:
All the best,
BS