Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

In-DB Unique with Summarize Group By not working

sheikhazim
6 - Meteoroid

Hi all,

 

I am trying to remove duplicates in my in-DB data by grouping all the fields with a Summarize In-DB tool. This works if I stream the data out and stream it back in before the Summarize In-DB tool but not if I don't. As in if the pair of tools in the image below are used. 

sheikhazim_0-1607345332363.png

Also, if I stream all the data out and use Alteryx's Unique tool then this also successfully removes duplicates. I was hoping someone might know why this is happening and how I can get the Summarize In-DB tool to remove duplicates without having to stream data out at any point.

 

My theory is that streaming out changes the data in some way which cleanses it but I haven't been able to confirm this is the case or understand any logical reason for why that would happen.

 

Any help, much appreciated.

 

Thanks

4 REPLIES 4
echuong1
Alteryx Alumni (Retired)

Are you able to elaborate on your entire process more?

 

The in-DB tools essentially outsource the processing to the actual database the data is being pulled from, meaning if you are connecting to a SQL DB, you are actually taking advantage of that processing power as well. The data is processed on that database as long as your are using in-DB tools, otherwise the data is downloaded into local temp space and then used from there.

 

The screenshot you have isn't really processing the data efficiently. You are taking data that is held in the DB and downloading locally with the Data Stream Out, and then re-importing into a temp table on the DB with the data stream in.

 

There is not an in-DB unique tool. To use most of the tools in Designer, you need to process locally and thus will need to use a data stream out to bring the data from the DB to local temp space. Alternatively, you can process in-DB with the in-DB summarize.

 

Your workflow should look like one of the below, actually configured of course. Hope this helps!

 

echuong1_0-1607349118854.png

 

sheikhazim
6 - Meteoroid

Hi echuong1,

 

Thanks for getting back to me on this.

 

My process is just to compare two tables within a database by unioning them and then remove the duplicates.

sample.PNG

I am able to do this successfully using branches 2 (stream out and use Alteryx's unique tool) and 3 (streaming out and streaming back in and using In-DB Summarize). However, I need to do this completely In-DB so I need to use branch 1 which should work but for some reason does not.

 

The basis of my question is simply, why does branch or method 2 and 3 work but 1 doesn't work? The only difference I can see is that 2 and 3 involve streaming the data out so maybe it is related to this?

 

When I say a branch "works" I mean the output of 2 and 3 is: each table has 20k records, union to give 40k and then deduplicate to give 20k records which is correct. However, output of 1 is that each table has 20k records, union to give 40k and then deduplicate to give 40k records which is incorrect.

 

Any help is really appreciated 🙂

 

 
 

 

 

echuong1
Alteryx Alumni (Retired)

Methods 1 and 3 should give the same result - is the summarize tool configured the same in both?

sheikhazim
6 - Meteoroid

Yes, the summarize tool in 3 is a copy and paste of 1 so has the exact same configuration. Hence, why I am left baffled by why the output is different.

 

Thanks

Labels