I've recently undertaken a project to convert a legacy Microsoft SSIS solution into an Alteryx equivalent. I've been working with Alteryx for years, but previously, we've embedded calls to Alteryx workflows from SSIS. I'm trying to flip the script and move into a new paradigm - Alteryx only. (We haven't done this in the past because it has seemed impossible...)
In one of our typical multi-package workflows, we load data from some source (in this case a CSV), run it though our custom cleanse -match-load process, and ultimately output some data to another flat file. That seems simple enough, yes? But here's the problem: I can't figure out how to get Alteryx to call stored procedures during the work flow . I've been looking through examples in the community that describe how to use an INPUT tool for this, but it's unclear how -- or even if -- this would work. Here's a simple screenshot of the workflow:
Once the data has been loaded, we need to call a series of stored procedures and continue the workflow. I'm just not sure how to do that using an input tool, or any tool, really, or if what I want to do is even possible in a single workflow. We are going to continue re-accessing this data over and over, so there will need to be other input tools downstream.
My most recent thought was to string together each set of steps as macros, where each macro either executes a set of procedures or loads the data as new import. But is that the correct way? And can the input tool run multiple stored procedures if they are stacked in the SQL editor as opposed to selecting them from the Stored Procedures tab in the tool?
Any suggestions from the community would be greatly appreciated. Thanks!
To rephrase your goals:
Your inputs are CSV files that you want to write into a database. Once the data is in some table(s) there, you'll call some existing stored procedure(s) to handle the transformation/processing/cleanse/matching rather than rebuild that logic within Alteryx.
To write the CSVs into the database, you'll use the Output Data tool targeting your database. Inside that tool's configuration is an option called "Post Create SQL Statement". This is where you can write the EXECUTE command for your procedure(s) to handle all the logic needed once the CSV data is written to the initial table(s) by Alteryx.
Here is a basic process flow:
So as you can see, there are tables along the way that are being written to, and then data is coming back into Alteryx after some SQL processing (not all SQL steps are shown).
So the problem is multifold:
This is why I was thinking about stringing together macros. Thoughts?
I see, so you have a lot of hopping in and out of the database. Unfortunately, I haven't seen what I'd consider a "good" solution for this situation. There was an Idea submitted in 2015 for a SQL tool that would solve this issue, but it was closed as "not planned" in 2016: Have an SQL Tool - Alteryx Community
This leaves you with the Pre/Post SQL actions in the Input/Output Data tools and the Dynamic Input Data tool.
You might also have luck using the Dynamic Input In-DB tool, as it allows you to pass in a SQL query like an Execute command.
In any of those cases, to get the in & out operations you may be stuck with the macro setup you described, or potentially some well placed Block Until Done tools (or maybe both).
I wish I had a cleaner solution for you.
So, if I wanted to try to use Macros, how would I get them properly strung together ? What input and output could be created that would lead one to the next and so on? Can there be an unrelated path through the macro -- like a placeholder -- that simply runs parallel to the actual steps and waits for the package to complete the SQL steps before delivering it to the next macro?
Since it seems like your source data & target are always in the Database (other than the initial CSVs), I wouldn't worry about passing data between your macros.
Just send through 1 dummy row at the parent workflow level through each macro. (Could literally be "Field_1" with a value of "a".)
Then setup your macros like this (the below image is a single macro with 2 containers for clarity) to enable the anchors for sequencing in the parent workflow & retrieve / write data in the database & execute the next stored procedure.