I feel like this is a newbie question but I dont typically work with input data as large as I will be with my current project...
I am importing a large dataset through an ODBC connect and don't want the data to be pulled in each time I run the flow as I build it out. (Time and server strain)
Do I just click the Cache Data button when I first run the workflow? What about if I save and close the file, will I need to run and Cache the input each time I re-open the flow?
Thanks!
Solved! Go to Solution.
This is approach I am taking for similar issue. It may or may not help with your issue.
I need approximately 90 days of data for this workflow, but I don't want to pull 90 days of data each time query runs. What I am doing is saving output to an Alteryx Database (YXDB) file. I then run query for 3 days, Union that data to the YXDB, de-duplicate it and run the rest of the workflow. The YXDB is updated in the process, and additional days are removed if needed to maintain rolling 90 days. In this instance I am only saving Summary level detail, not the entire data set. The file may be large if you need to save all of the details.
Caching will only remain as long as the workflow is open.
You can "cache and run" on the input the first time you run it. This will not re-run the database pull and will use the data downloaded into temp space moving forward for the remainder of your session. The temp data will be cleared when you close the workflow, so depending on your use case this may not be the best route.
If you want to keep using the data without having to re-query the database each time you open the workflow, you can start off by inputting the data, then using an output and writing it to a .yxdb file. This will store the data in an alteryx optimized format, and that can be used as your input for subsequent runs instead of the actual database.
Hope this helps!
Hi @cswanson21x ,
If you looking to be able to run the workflow with the complete dataset every time(while developing), the caching method that @echuong1 mentioned would be viable.
But if you are looking to restrict the amount of data that comes in during development, i.e every run of the workflow during development does not need the whole data set thereby increasing the execution speed you can use this feature:-
Within the Input tool in alteryx one of the options is Record Limit. You can specify the number of rows here, thereby limiting the amount of data coming in during development.
Doing it this way would make sure you restrict the amount of data while being able to develop/test with an active query.
Thank you!
This is a perfect solution.
I should have know from my Core cert studying that the yxdb is the ultimate file format to save that data as!
Thanks again,
User | Count |
---|---|
107 | |
82 | |
72 | |
54 | |
40 |