This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
We have workflows that are scheduled to run daily, reading in the last 3 days from an Impala database table that updates daily. At the end of the workflow we output the latest day (where max date = date), load it in to Impala and append it to another table via the In-DB tools.
Problem we have is that sometimes that source table refresh is late so the workflow would read in previous days, output yesterdays data and duplicating it.
How can we prevent this from happening? Anyway to block it if the file hasn't been updated or is the same as yesterdays data? The date in the file is always yesterdays date and Friday on a Monday.
My current solution is to have a loading table it reads in to and then compares it to that day (another input tool and blocking until done to read it in to the In-DB), but this seems like a messy and unstable way of doing it.
Please reply with details about the filtering of the date, and the type of database. There should be a few options with Alteryx date functions or SQL. A list of DateTime functions is available here: https://help.alteryx.com/current/designer/datetime-functions. Changing the date filter may be the easiest method.
Also, if the data set is not too large, we could stream it into Designer and then use a Block Until Done tool before streaming it back to the table.
Block Until Done should be a full-proof way to ensure a specific sequence in a workflow.
I noticed my previous reply that I suggested the use of this tool when you already had it in place, my apologies. Could you clarify why you think it may be unstable? Maybe a screen capture of the use case in your workflow would help.
The Engine primarily uses two processors for workflows. (Additional multithreading is coming in the future for more performance.) Currently, there is one thread for passing data between tools, and a second thread for the tools to process the data.
These threads pass data through the tools as quickly as possible without a specific sequence. Hence, there is a need for a Block Until Done Tool when a sequence is required.