Hi all,
I run a workflow that contains two very large datasets. One of which I receive a few weeks before the other. However, when I receive the second dataset (which is about 1/8th the size of the first dataset) I need to complete the workflow pretty quickly. I takes quite a bit of time to run the workflow when I have to read all of the data into the workflow. I was wondering, the day before, can I cache the first source of data up to the point right before it interacts with the second source of data? Then, once I get the second source I can run the workflow. Since the first dataset is already cached I have essentially saved myself that amount of time. Does it work like this? Please let me know if you have any questions.
EDIT: It takes a few hours to read in the large dataset so my intuition tells me that if I cache this data and the manipulations I need to perform beforehand then I will be saved all that time when I actually run the full workflow. I just am not sure if I leave Alteryx open for a day with all that info cached will that cause any unexpected issues.