Free Trial

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Outbound connector on output tools

It would be great to have an outbound connector on output tools for 2 reasons:

a) if this outbound connector can carry key results of the output process, this can be saved in an audit log.  For example - rowcounts; success/failure.   This kind of capabiltiy (to generate a log, or to be able to check the rowcount of rows committed to a database) is important for any large BI ETL process

b) this woudl also allow the process to continue after the output process and also act as a flow of control.   For example:

- First output the product dimension

- once done - then connect (using the outbound connector) to the next macro which then updates the Sales fact table using this product dimension (foreign key dependancy)

 

 

10 Comments
DultonM
11 - Bolide

I think (a) could prove useful for metric purposes. (b) can be accomplished using the Block Until Done tool.

SeanAdams
17 - Castor
17 - Castor

When pushing rows out to a SQL server table that has an identity column - we often need to use an Output clause in the insert statement to return the new primary keys used so that these can be passed on to the next step.

 

In cases like this - the output tool should be able to pass these IDs on to the next tool for down-stream manipulation (i.e. the output tool needs to be non-terminal node - so that it can have an outbound connector to pass this on).

This is a very common problem when inserting into a dimension table with an integer primary key, which you then need in order to pass this onto your Fact table loader.

 

Example: 

Table definition:

Create table Pets
(
PetID int identity(1,1) primary key not null,
PetName varchar(20) not null,

Age int not null,
)

 

Query: 

 

Truncate table Pets
GO
Insert into Pets
output
    inserted.petID,
    inserted.Petname
values('Fido',10)

 

Insert into Pets
output
    inserted.petID,
    inserted.Petname
values('tweety',15)

 

Output from SQL 

 

2017-09-28_0-27-12.png

autolycus
8 - Asteroid
It would also be helpful to get the exit code from the sql execution to determine success/warning/error status.
rslopez2
8 - Asteroid

This would be extremely useful!

 

We currently spend lots of time\entire extra builds just to compensate for limitations on point A for @SeanAdams 

 

we would also find tons of value in B) specifically in use cases where a translation table between Dims and Fact runs are helpful. This would really simply workflow designs (often ultimately resulting in 2 separate builds to maintain)

 

Example:

  Updating\inserting primary Keys in Dim 1 and Dim 2 - Pass updates Dims to Translation table updates needed to simplify bridge logic between a set of natural keys to be leveraged in the subsequent Fact loader run.  

AlteryxCommunityTeam
Alteryx Community Team
Alteryx Community Team
Status changed to: Accepting Votes
 
TheOC
15 - Aurora
15 - Aurora

I believe similar may now be possible with the Control Containers - however, this area of Alteryx could continue to receive some love. 

SeanAdams
17 - Castor
17 - Castor

Thanks @TheOC  - I do agree that control containers do achieve some of this.

 

However what you don't get from control containers is the return value from DB server - which often has important messages such as "15 rows committed" or if you've called a stored proc, then there can be data on the return message.

 

thanks for replying my friend!

TheOC
15 - Aurora
15 - Aurora

hey @SeanAdams ,
Great point - I totally agree.

For curiosity - and maybe helpful for Alteryx - If control containers extracted more information such as database metadata outputs, would this be a viable solution to your Idea? Or do you still see it as being better as an optional output to the output tool (with a toggle in the configuration)?

SeanAdams
17 - Castor
17 - Castor

Hey @TheOC 

As you say, some parts of this can be solved in a different way.

 

- Multipart SQL:

This is where you do an insert, but then need to return the latest ID used.    This is very common when you are inserting into a table that has an automatic identity column which you need to use somewhere else down the track.    

 

For example:

insert into XXX

Select @@identity

 

This is a common pattern when you are doing more rigorous ETL and you need to make sure that you don't try to reuse identies and / or you have to pass this newly inserted ID onto another process downstream.

This can DEFINATELY be achieved by control containers - if and only if we have a generic SQL tool which has been requested by a few folks 

Example - this one from the legend @chris_love  in 2015 has 133 votes on it Have an SQL Tool - Alteryx Community

 

- Pattern 2 is where you want to know the number of records impacted.

situation: you are doing a delta insert / upsert.

This is where you need to insert / update ONLY IF the data has changed.    This is critical in situations where audit trails are a matter of regulatory / audit importance.   For example - people's names and addresses - if you just do an "update / insert if new" then even though Owen's details have not changed, Alteryx will still issue an update which changes nothing.

 

So the way to do these correctly is to insert into a temporary / loader table - and then either do a complex SQL merge statement; or insert using set logic (except / intersect).

 

In this case - the only way to know how many records were updated accurately is to get an output from the tool running the query saying "5 rows updated" just like when you run this for yourself in a query window.

 

Now - you may say "but Sean - you can still get this by just doing a SQL query after this to figure out how many rows have an updated date / time on them" and you have a good idea - but the problems are:

  • SQL Databases that use date time to create audit-trails often put different date-times on each row because they are measuring sub milliseconds - so what are you looking for?
  • You may be running multiple copies of this job in parallel - and so time windows are not reliable

In summary - the only way to do this right is to pass on the info that SQL is sending back to Alteryx as an output node on output tools.

 

- Third and final reason to include outputs is errors.

Many modern ETL tools put an error output on every tool - and this allows you to take the rows which error out and reprocess them.

This different than control containers which either fail or succeed in totality - you want to say "for the rows which worked - carry on; for the rows which failed - please set them aside so that I can rework them / debug them".

 

Failing the entire control container / job is OK if you're moving a few hundred (or even a few hundred thousand) rows - but once you get into the multi-million / multi-billion rows  - it really is terrible for an entire workflow to fail because  1 row out of 1 bn fails.    This is SUPER common in large ETLs which is why this would be a helpful way to make Alteryx even better at data prep & blending.

 

Thank you again for engaging in the discussion @TheOC  - looking forward to seeing related ideas from you!

 

Have a great weekend

Sean

 

SeanAdams
17 - Castor
17 - Castor

cc: @VojtechT