Free Trial

Alteryx Server Discussions

Find answers, ask questions, and share expertise about Alteryx Server.

Prevent dirty reads in Alteryx

Morten
6 - Meteoroid
Case: I want to prevent reading from a database table that is being written to.
 
Problem: let’s say I have two workflows. Workflow 1 uses the regular output tool in Alteryx to write 1 million records to a table. Workflow 2 then starts reading from the same table that workflow 1 is writing to and this results in a dirty read – workflow 2 only reads the number of records that were transmitted at the time of reading.
 
The way I understand the output tool is that it chunks up the writing session into smaller transactions (specified by the user) and finishes once the writing session completes. However, it doesn’t lock the table while writing and therefore dirty reads can happen if workflow 2 reads meanwhile workflow 1 is writing.
 
The situation occurs when workflows on the Alteryx server runs simultaneously.
Note:
  • In many cases workflows are executed in an extract, transform, load order to avoid concurrency problems. However, this is not possible as workflows need to be run at different schedules which have overlapping runtime.
  • DB runs on MSSQL
I’ve tried
  • Changing the isolation level on the database, but Alteryx ignore these.
  • Using locks in a pre-SQL statement, but they release right before the writing sessions start.
  • The SQL Server bulk loader protocol, but dirty reads still occur.
3 REPLIES 3
fmvizcaino
17 - Castor
17 - Castor

Hi @Morten ,

 

I'm not aware of a direct solution for that.

What I've already done was, to create in the pre and post SQL statement, a table with 2 columns, start (preSQL) and finish (postSQL), that way, if the finish is null I'm sure it is a table that is being currently updated.

 

You have the option to create a counter to wait or to simply stop the process.

 

It is not a pretty solution, but it does work.

 

Best,

Fernando Vizcaino

Morten
6 - Meteoroid

Hi Fernando Vizcaino,

 

Thank you for sharing you solution. I actually worked with the same idea, but went with something else, as it requires too much configuration for every time you need to input or output data.

 

Instead I use the in-db tools:

First I use a data stream in tool create a temp table.

Then i bulk upload to the table using write data in-db

 

This prevents dirty reads and forces a native error, which I can use with a try-catch list-runner I made. However, It doesn't support pre and post sql statement, Update: insert if new and data is not sorted.

So still looking for a better solution..

Thank you for sharing your solution, it is always nice to know, that other people thought something similar. So really appreciate it.

 

Best regards,

 

Morten

JasonFortriede
5 - Atom

I came across this same situation. Here was my solution.

 

1) When writing my data to a table in Workflow 1, I include a column with TotalRowCount, so each row has how many records I'm writing.

2) When pulled data from the table in Workflow 2, I include in the WHERE clause language such as: WHERE Table1.TotalRowCount = (SELECT Count(*) FROM Table1)