Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Unique tool not giving correct data while using AMP Engine in large volume

MukulA
5 - Atom
I am using unique tool while enabling AMP Engine and it is giving incorrect result(not removing all duplicate lines) and when I am running the same workflow without enabling AMP engine it is giving correct result. can anyone tell whether AMP engine is not working for Unique tool only or large data size(approx. 20million rows) causing issue.
12 REPLIES 12
clmc9601
13 - Pulsar
13 - Pulsar

Hi @MukulA,

Due to the way AMP processes data in chunks, my guess is that AMP is removing duplicates within each chunk but not from all the data together. You can test this theory by running a smaller portion of your dataset and see if that affects the unique tool's effectiveness within AMP (if my theory holds, the unique tool will still be ineffective). 

 

As a solution to this, you could try replacing your unique tool with a summarize tool. This will put the data in the fewest number of groups regardless of AMP processing chunks.

MukulA
5 - Atom

I have already used summarize tool before unique tool to reduce size of data. but this is not working.

 

is there any other solution to eliminate duplicate lines from workflow or I have to switch the workflow back to original engine which is taking very long time to complete the workflow as compared with AMP engine.

 

thanks for you help!

danilang
19 - Altair
19 - Altair

Hi @MukulA 

 

@clmc9601 is correct in that the AMP engine can cause different results with the unique tool.  See the help page for details.   

 

To get around this, try the following 

 

danilang_0-1612017029643.png

 

It's based on using the Sample tool, which is supposed to be completely Amp compatible.  The top branch take only the first record in every unique value and the bottom branch skips the first one, returning only the remaining duplicate records.

 

This method works on a large dataset(10M rows) that I created. but it runs approximately 50% as fast.  Please try it out on yours and post the results here.

 

Dan 

 

 

MukulA
5 - Atom

Thanks @danilang  for your reply.

 

I have tried your solution of using sort & sample tool to eliminate duplicate rows and enabled AMP engine but this is again not giving correct result. it is providing same result just like unique tool in AMP engine.

 

do you have any other solution please..

TonyaS
Alteryx
Alteryx

Hello @MukulA 

 

First, thank you for using the new AMP Engine. 

I encourage you to continue to report any use case issues that you find with running workflows with AMP Engine enabled. We worked hard to identify differences from the original Engine as well as provide guidance on how to better optimize workflows to run with AMP. 

 

Have you tried adding a Sort before your Unique tool so that the data is sorted in order by the values you are trying to de-dupe? 

 

There are differences with AMP in the ordering of data out of certain tools. This as well as the packeting of data that was previously mentioned could be causing what you are seeing with large data sets. 

 

Here is some information related to the new AMP Engine: 

 

https://help.alteryx.com/current/designer/alteryx-amp-engine

https://help.alteryx.com/current/designer/Alteryx-Engine-and-AMP-Main-Differences

https://help.alteryx.com/current/designer/AMP-Memory-Use

https://help.alteryx.com/current/designer/tool-use-amp

 

https://community.alteryx.com/t5/Analytics/Accelerate-Your-Analytic-Processes-with-the-New-AMP-Engin...

https://community.alteryx.com/t5/Engine-Works/AMPlify-your-Workflows/ba-p/617590

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Best-practice-with-the-new-Multi-threa...

AlterEverything Podcast: https://community.alteryx.com/t5/Alter-Everything-Podcast/66-The-Alteryx-AMP-Engine-Explained/ba-p/5...

https://community.alteryx.com/t5/Engine-Works/AMP-Engine-Technical-Deep-Dive-Part-1-Why-AMP/ba-p/570...  

https://community.alteryx.com/t5/Engine-Works/AMP-Engine-Technical-Deep-Dive-Part-2-Key-concepts-of-... 

Tonya Smith
Sr. Technical Product Manager, cloud App Builder
MukulA
5 - Atom

Thanks @TonyaS  for your reply!

 

I have already tried multiple options as mentioned below but none of them are working using AMP engine:-

 

1. Sort Tool before Unique

2. Sort toll and Sample tool (Take only 1 row option and group by all columns in by database)

 

refer workflow screenshot for your reference.

 

let me know in case you need any further details

Struck
6 - Meteoroid

Hi@TonyaS 
I wanted to revamp this because AMP engine comes by default in new Alteryx versions.

Here's a small dataset that is able to generate the issue:

 

Input to the tool: 

Struck_1-1680190395331.png


Output of the tool:

Struck_2-1680190426884.png

 

 

The problem in my case is when empty strings and nulls are not one after the other. It eliminates consecutive empty strings, and consecutive nulls, but when they are one after the other it does not get rid of them and mark as duplicate. Disabling AMP engine fixes the issue.

 

TonyaS
Alteryx
Alteryx

@Struck 
Thank you for the reminder and for including a workflow. I do see the different number of records with AMP even using Engine Compatibility Mode (which should fix any record order differences with AMP). I will work with my colleagues to get a defect created for this issue. 

Tonya Smith
Sr. Technical Product Manager, cloud App Builder
BS_THE_ANALYST
14 - Magnetar

@Struck I'm mostly impressed that you've got both nulls and empty cells in the same column.

 

Nevertheless, since Unique is failing you i.e:

BS_THE_ANALYST_0-1680201303384.png

Do you have any objection to using the Summarize tool and grouping by on every field like so:

BS_THE_ANALYST_1-1680201324784.png


All the best,

BS 

Labels