community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Ideas

Share your Designer product ideas - we're listening!

Automatic Pushdown Optimization

I find it a little strange that we need to utilize a different set of tools (the In-DB tools) in order to achieve push-down optimization. This is not the case, for instance (as far as I know), in SPSS, perhaps others.

 

My request is to automatically push logic to the database layer if and when it's possible to do so.

 

In order to maintain operation similar to existing, it could be implemented as an overall workflow setting (on or off) such that leaving it turned off would result in tools continuing to behave as they do today.  Turning it on simply enacts a process of combining as many tools as possible into SQL that is pushed to the database server, and assumes whatever permissions are necessary to make that happen; (e.g. ability to write temp tables or etc...)

 

Thanks!

 

4 Comments
Nebula
Nebula

Hey John - so what you're suggesting is that you pre-optimize the SQL based on the workflow?

 

In other words - if I have a Select * from table; followed by a select tool that throws away all of the fields; followed by a filter that throws away 90% of the rows - the question is "Can we parse the workflow to spot this, and then query the data more intelligently by pushing these filters and column selects down into the query".

 

I think that this is a great suggestion, and would allow people to get far better performance out of their flows!

 

I've suggested something similar (some related links below) to build the parser which reverse compiles the Alteryx flow to find which fields are never used and suggests removing them from the flow (see first link below) - with your suggestion then pushing this down to the server this would be very powerful.

 

Related ideas that may be worth supporting too:

 

 

ACE Emeritus
ACE Emeritus

Hi Sean,

I guess I hadn't thought much about the implementation details, but yes: regardless of the initial query in the input tool; all preceeding tools simply becomes subqeries for the current tool; thus a simple workflow with INPUT TOOL > SELECT > FILTER > SUMMARY > SORT > TOP 100 might look like this behind the scenes:

SELECT top 100 * FROM (
    SELECT * FROM (
        SELECT sum(d.col1) as sumd1, d.col2, d.col3 FROM (
            SELECT * FROM (
                SELECT b.col1, b.col2, b.col3 FROM (
                    SELECT * FROM (
                        "INPUT TOOL QUERY"
                    ) as a
                ) as b
            ) as c
            WHERE "FILTER TOOL OPTIONS"
        ) as d
        GROUP BY d.col2, d.col3
    ) as e
    ORDER BY e.sumd1, e.col2, e.col3
) as f

It wouldn't be trivial to implement given that every database will have different syntax... but it can be done with existing In-DB tools, so it would be nice if it were simply "built-in".

Nebula
Nebula

it's interesting @JohnJPS - if you unpick how the inDB connector creates queries - it does exactly this.   It doesn't nest it, but creates temporary aliases to do the same thing (same outcome, it just reads more linearly, and probably is easier for the SQL optimizer to work through).

 

I'm thinking something more along the lines of a code-scanner.   We built one of these for VB6 once - where it would check for all the variables declared, and then scan backwards from the end of any given code-scope to see if that variable was used.    you could do the same thing here  if a variable is untouched between read, and a filter which is 20 steps down the flow - you could then back-propagate the filter backwards.

 

@Ned - I presume that you and the team have looked at this kind of back-propagation as part of your ongoing improvement work - does this idea make functional sense?   If it's worth talking this through in more detail, would be happy to spend the time.

 

Cheers

Sean

ACE Emeritus
ACE Emeritus

@SeanAdams,

Cool stuff. My example is based on what I see Hibernate/NHibernate doing in terms of auto-generated query strings.

 

Your backprop optimizer is interesting beyond push-down: reducing unnecessary data would help performance, no doubt... I'm tempted to think Alteryx already does this: I seem to recall observing filters reducing row counts earlier than expected... not sure though (and posting from my phone so can't recreate here bow to validate my memory).

 

Fun stuff.  Cheers,

John