Learn more about the Alteryx Maveryx Universe recently announced at Inspire 2023!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Incremental Extract from SQL Server based on Timestamp

sjcrain79
6 - Meteoroid

I'm working on a project where I'm pulling daily loads from a SQL db and then storing in a .yxdb file.  Rather than have the end user input a date before running the workflow, I'd rather have it compare the distinct timestamps and just load the ones that don't exist in the existing .yxdb file.  The initial dataset is very large and the daily extract could easily be a few hundred thousand rows of data, so ideally I'd like to limit the daily data pull to only be what's new.

 

I've only got a designer license, not working with server if that makes a difference.  I also don't have write access to the DB I'm working with.

 

Thanks,

 

Sean

8 REPLIES 8
Thableaus
17 - Castor
17 - Castor

Hi @sjcrain79 

 

EDIT: I was misled by some other thing I was thinking and the old post was not the proper way to go.

 

Cheers,

sjcrain79
6 - Meteoroid

I think I need to edit my first post to mention that the database size is very large.  If I go down this path, I think I would end up having to pull about 5 gb each time, then filter what I need.  In order to make this efficient I think I need to filter the initial data pull.

Thableaus
17 - Castor
17 - Castor

@sjcrain79 

 

Hmm, I see...

If the date is what actually controls the increment, you could create a parallel file to get the last date after last execution.

 

Then you could append that with Append Fields Tool and filter the date against your SQL Server Table.

 

That's just an idea, but it surely depends on how complex your incremental process is.

 

Cheers,

David-Carnes
12 - Quasar

OK, a couple of things to consider:

 

  1. You cannot update or insert into .yxdb files, only overwrite.
  2. If you don't know what date to use to filter the records until runtime, then two options are available to you.

Regarding #2, the easiest approach is to use the Dynamic Input tool, located on the Developer tab.  I use this a lot.

 

Another method might be to create a macro that takes the value of an incoming field and updated the SQL statement inside an Input tool.

 

When I use the Dynamic Input, I have a SQL statement like

select something, something_else

from somewhere

where somedate = '2019-06-10'

 

The configure the tool to Modify SQL Query, Add --> SQL: Update WHERE Clause.  Choose the field containing the date.

 

Best,
David

 

 

 

David-Carnes
12 - Quasar

Here's a quick example

 

sjcrain79.PNG

 

 

sjcrain79
6 - Meteoroid

Regarding this, here's what my workflow looks like now.  Does this open up any other possibilities?

 

Initial Data Pull.PNG

 

Dataset Production.PNG

 

 

David-Carnes
12 - Quasar

Sean!

OK, I need to step through this to ensure I have not made any incorrect assumptions.  Please correct me where necessary.

 

  • You have an initial load in which you get data back to the beginning of time.  Once this is complete, you are not running your initial load again.
  • When new data goes into your database, the timestamps are all after the latest timestamp in your initial load.  (If new data can have timestamps that are earlier, then that'll be a different set of logic.)
  • Your Supplement Data are the records in your SQL Server database that are not in the Initial Load.  Basically, no date range, just everything since.

If these are correct, then my example above will get you to the point where you have some "manual entry."  Just use the Summarize tool to get the most recent timestamp then use the Dynamic Input tool to select from your SQL Server.  It should be the same SQL from your initial load but have the last line in your where clause look like "and timestamp > '1900-01-01'".    Then configure the tool to replace 1900-01-01 with the value max_timestamp from the Summarize tool.

 

The whole point of all of the above is so you don't need to change a date or datetime value with each run.

 

Anyway, I have more questions and concerns that will be in my next post.

 

 

David-Carnes
12 - Quasar

OK, bigger picture.  From your original post:

 

"Rather than have the end user input a date....  I've only got a designer license, not working with server...."

Does your end user have Alteryx Designer?  If not, then you're the end user.

If they do, then you can turn this into an analytic app that they run.  Or you could just give them the workflow.

 

From a subsequent post:

What is this?What is this?

Where do the manually updated rows come from?  If it's just a few fields making up one row of data, then create an app and use Textboxes for input.  If a file, like a spreadsheet, and a different spreadsheet for every run, then an app with a File Browse.  Or just keep running the workflow and make edits when needed.

 

I don't know your exact situation so please consider everything a suggestion, a musing if you will.  I would not want to give you advice that you then follow.  :smiley emoji:

 

 

Best,

David

 

Labels