Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Calculate statistics after write in DB

Statistics are tools used by a lot of DB to improve speed of queries (Hive, Vertica, etc...). It may be interesting to have an option on the write in db or data stream in to calculate the statistics. (something like a check box for )

 

Example on Hive : analyse {table} comute statistics; analyse {table} compute statistics for columns;

16 Comments
SeanAdams
17 - Castor
17 - Castor

Hey @saubert

 

I think you may be able to do this via a post-execution SQL in normal input/output tools, no?

 

https://docs.microsoft.com/en-us/sql/t-sql/statements/update-statistics-transact-sql?view=sql-server...

... but then you don't have a post SQL option on an IN-DB write tool.

 

As an aside - I would be cautious about doing synchronous statistics updates - on large tables this can take a long time and if you make this a serialised part of your output job that can lead to large delays in completing your canvas.     What may be better is to investigate how to schedule these kinds of jobs asynchronously on your server (on SQL Server, this is done through the SQL agent, not sure about Hive) and then you can still get the benefit of updated stats; but not the overhead on a write job in a canvas.

saubert
9 - Comet

Hello @SeanAdams,

 

I have developped a macro to do that, it takes the same classic alias that I use for the indb connection and then an output data with a post (or pre?) sql.

 

I did the same thing to create view...

Tanya
5 - Atom

 Hello @saubert,

 

Could you please provide some insights on your macro? I need to add 

compute stats <table-name>;

after in-db writing to Hadoop table <table-name>.

 

Thank you.

stevenwaddon
6 - Meteoroid

 

 

saubert
9 - Comet

@stevenwaddonas I mentioned previously, I wrote a macro for that. But it is suboptimal and I cannot retrieve the medata in Alteryx after the macro.

saubert
9 - Comet

@ARich: Thanks. It would change a lot for us here.

saubert
9 - Comet

@stevenwaddon@Tanya

Here the screenshots of my macros
The first is to generate the analyse stat query, the second to send the query
indb macroindb macro

macro that sends any query to a databasemacro that sends any query to a database

Tanya
5 - Atom

For my needs I was able to run a simple workflow: 

 

Compute Stats.jpg

 

 

 

 

 

saubert
9 - Comet

Hello @Tanya .

I'm afraid you don't get it. The idea is about in db workflow, that means in database. And that's not the case of your workflow. Of course, it's easy for a standard workflow since you have the pre/post sql option. As you may have understood on my previous post, it's what I use in my macro. But you can't process billions of rows in minute with standard workflow/

Tanya
5 - Atom

I totally got it!

 

Your solution is a sophisticated one and I will look into adapting it. Mine is a simple workaround that worked in my case (Hadoop HDFS, 6M rows, run time 40 seconds). The stats are computed after updating the data in-db (previous workflow), and before in-db data consumption (next workflow). It's not an elegant solution, but it takes 5 seconds to create and it does the job.

 

I hope Alteryx will create a standard solution that can be used without going into macros.