I have a workflow here which is pretty simple, I'm joining an input data file to 1 database query, anything that doesn't match is joined to another database query instead, then I bring it back together and write to a table.
This is all occurring in-db - nothing is streaming back down to my computer that would make it slow.
I am also restricting to just the top 1000 records in the first query. So only 1000 records pass through this whole workflow.
With only 1000 records, the workflow runs in 4 seconds... until I add a write data in-db tool at the end. It's literally just putting those 1000 records into a table, and not even streaming them from my computer since it's all in-db. That extra write in-db tool takes the run time from 4 seconds to 2-3 minutes. 2-3 minutes to just write 1000 rows into a table using records that are all still in-db? My full starting dataset has 1.5 million records. It would take months to run the whole thing at that pace?
hmm I think I partially figured it out, without the write in-db tool there, i don't think it's actually executing anything. it realizes there's no "end goal" and just goes through the motions.
if i use a browse in-db tool instead of a write in db, it still takes 2 minutes, so i assume the query itself is finally executing and taking the extra time.
what's the proper way to use in-db tools to look up a list of data against a database? those extra connect in-db tools that it's joining to are not individual tables, they are queries. i want to take my starting records and look up a bunch of things. so i put a simple query in those other connection tools, that joins 4 or 5 tables. then i used the join in-db to connect my table to it.
am i not using the indexes by doing it this way? does it try to execute the whole query in the connect-in-db tool first, before doing the join? do i have to put each table i want into a separate connect in-db tool and do all the joins using alteryx join tools rather than just putting a single query into the connect in-db?
if i do this entire exercise in SSMS it only takes a few seconds.
When you say SSMS I assume you're using SQL Server ? - that's important as well
For every inDB tool there is (usually) a temp table generated, so your first join is a temp table, then the filter is (possibly?) a query on that table, then the Select could be another query/part of the filter query/another temp table (you would need to see what's happening in the log file or in the database to tell the difference). But you're right, the REAL processing happens when you are writing or browsing the data.
BUT having said all that, try it on a million records, I don't think the processing time will be as linear as you think, it might do a million records in 5-6 mins
