This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
It's the most wonderful time of the year - Santalytics 2020 is here! This year, Santa's workshop needs the help of the Alteryx Community to help get back on track, so head over to the Group Hub for all the info to get started!
First off, sorry if this isn't the correct place to ask this question. I've searched around and couldn't really find a conclusive answer to my problem, so I figured I'd start here.
I have a workflow that's mostly in-database, but I'm having a problem with the "browse" tool within the in-db suite of tools. I can run a simple query with a formula tool attached to it that adds a RowID (again, in-db) that will run in 15 seconds without any output or browse tool on the end of it.
Adding a browse tool for the first 100 records changes that by what seems to be forever. I actually haven't let it run all the way to completion, that's how long it takes (over 5 minutes). However, if I add the data stream out tool with a browse connected to that, it runs a little slower than before, but gets basically back to the original run time of around 17 seconds.
Can anyone explain why this is? Are there server settings that could have an impact on the browse tool? I don't understand how using an in-db browse for 100 records stops the workflow from completing, but streaming out into a standard workflow browse tool has virtually no effect.
My explanation for you is that the SQL passed for the entire unit of work in-database is compiled into an efficient SQL statement by the RDBMS. When you put a browse into the mix, you are now creating a temporary table that might require a lot of data (I/O) to create just to get a view of the first 100 records. Maybe you have to examine 100% of the data to get to the 100 records for the browse?
Now why does streaming the data improve performance? Because you don't need to examine 100% of the data. You are doing a select and when the streaming data comes out (i assume a 100 record sample is in play), then the stream of data is stopped.
This is all guess-work. I would use browses sparingly in-db. Certainly, I'd remove them in production. Since it's been 3 hours since you posted your question, I thought I'd take a stab at an answer.
Alteryx ACE & Top Community Contributor
Chaos reigns within. Repent, reflect and reboot. Order shall return.