This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
It's the most wonderful time of the year - Santalytics 2020 is here! This year, Santa's workshop needs the help of the Alteryx Community to help get back on track, so head over to the Group Hub for all the info to get started!
So I would like to create an archive process in Alteryx. We have a lot of tables in SQL that my team maintains (this is data we generate, such as classifications for our most-common dimensions and models), and I'd like to store this data on our Alteryx Server. I'm hoping someone can tell me the best way to do the following in Alteryx:
Have a list (probably a SQL table, so my team can easily add to it, in case they want to archive tables) that has all the tables that I want to archive. It would have the DB, schema, and table name.
For each table in the list:
Get all the data in the table
Take the name of the table
Take the current date
Concatenate the two together so that I can get the name for a file
Save the file onto the server that stores this data.
The end result: My server will have a list of files (either Excel, Access, or Alteryx database files) that will named be Table_1_2018_04_20, for example.
The steps that I'm missing are:
How do I set this process up so it can run for each row in a table?
How do I get the value of the row into a Select pull for Alteryx?
Please let me know if there's a way to do this--or if there's a better way to get to the same solution. Ideally, I'd like to expand this so that I don't store the table if there hasn't been a change to it, but I'm going to start here.
If I understand your question correctly, are you trying to check your current database for changes? If so, you can search your database using dynamic input with search criteria for anything that makes a row unique (SELECT * FROM [Schema].[Table Name] WHERE [Criteria that makes a row unique]). Output all original fields with the primary key from the table. If a primary key is missing then a change has been made (the unique criteria didn't match).
This sounds useful and could apply to what I'm trying to do, but it's not exactly what I need. I have a number of tables that I want to archive every week. These tables are built my team, and are mostly dimensions (such as product classification information unique to my team) or model information. Rows can be dropped or added constantly, and I want to archive a weekly/monthly snapshot of these tables, so we don't lose too much of our data in case of a crash or if the data gets deleted (has happened more than once). Unfortunately, most of these tables aren't timestamped, so I can't just pull the most current records.
I want to know how I can, with a single set of nodes, loop through all of these tables and store the data onto our Alteryx server instead of our SQL server, on an Alteryx database instead of in SQL server.
I could set up a new set of nodes for each table. So, let's say I have 10 tables I want to store in my archive process. I could set up 10 sets of nodes that essentially go pull data from table -> then save the data from this table in a file named after the table with a timestamp (so essentially, every week I would save Table 1's data in a new file called Table_1_20180420 and the next week I would save Table_20180427). But this is not very dynamic especially because I want to do the exact same process for every table, and would like to make it so that it can get added in without my teammates having to develop a new set of nodes.
If I do your solution, how can I set this up so I can loop through a set of tables (not all the tables in DB), then store the data if I see a change, and store it with the table name + date, and do it so I don't have to set up a set of nodes more than once? Setting up the same process for multiple tables is more important to me right now than setting up so I store only when there's been a change (which is something I care about).
Hearing this, it seems your team should implement a slowly changing dimension (SCD) process and temporal tables.
Have your team agree upon which SCD process is appropriate (I recommend Type 4 or Type 6)
Set up your tables to automatically track history with a temporal table
If you still want to archive changes I would just copy your temporal table to another location or file (I can't give instructions for how to do this with the alteryx server, but the concept is the same)
Thank you. I want to save the most recent data in these tables in Excel/CSV files on a server separate from our SQL server, in case it gets deleted. In short, I want to save snapshots of this data outside of the SQL server environment. If I lose the data, I can just pull the snapshot. Temporal tables and SCD aren't going to help me if the SQL server goes down, and we lose the data that was developed specific to our team.
I want to set up an Alteryx workflow that does this:
--I pull a table from SQL server that has a list of all the tables I want to archive.
--For every table in this list, I take the data and save it in an Excel/CSV named after the table + the timestamp of the pull.
--The end result is I will have a set of Excel/CSV files that have a snapshot of the tables I'm looking to archive.
The real focus here is how do I run a for loop where I can set up a list of multiples tables and then apply a select data from table -> save with table name + date?