Alteryx Designer Desktop Discussions

Apeksha_Agrawal · ‎11-28-2019

Hi,

I am new to Alteryx and need help/ input while I am running my first workflow.

1. I have build a predictive model using Logistic model and random forest along with EDA and ETL steps. While I am able to put my EDA steps in container, I have to rerun all the other steps while I encounter any error in model. The time for my ETL steps is approx. 1 and half hour and any error resolving turns to be costly affair in terms of time. Is there any other way to optimise the workflow or partially run the workflow. I have also tried to convert my input files into alteryx data type to reduce the data loading time and run the workflows

2.Another thing I observed is any Browse or Data Investigation tool takes comparatively more time and doesn't even load completely after running workflow. Is it something to do with my CPU RAM (memory) available or in general it needs more time. My current data is around 1 million rows after all joins and transformations with approx 100 columns. RAM is 8GB and icore 3 processor.

3. I wonder how would be the performance of tool be in terms of Big data Handling. Do we need to configure additional GPU's?

Any help would be much appreciated.

Thanks,

Apeksha

danilang · ‎11-28-2019

Hi @Apeksha_Agrawal

Question 1. You can right-click on any tool in the workflow and choose Cache and run workflow

This will run the workflow and save the results of the output anchor to a temporary file. The next time you run the the workflow, it will start from this point with the pre-calculated results. There are a few restrictions on this. The tool has to have only one output anchor and the tool can't be inside of split data stream. A split stream is one where the data is split and then rejoins further down the tool chain

In the image above, you can cache any tools in the green boxes. The tools in the red box can't be cached(the option is greyed out), since all of them are part of the stream that splits at the input tool and rejoins at the Dynamic Rename(the gray one).

Question 2 and 3. 8GB is the minimum recommended RAM. For handling a data set of your size, you should have at least 16GB.

Dan

Apeksha_Agrawal · ‎11-28-2019

Thanks alot Danilang. This definitely helps. I will try implementing it.

Apeksha_Agrawal · ‎11-28-2019

Hey @Danilang,

Is there any method to cache multiple threads in a workflow. For example there is separate ETL, EDA on scoring and train dataset and both will be separately given as input to a model. Using what you mentioned above I am only able to cache the train workflow while the one of Test/Scoring data is running all over again. Thanks

Regards,

Apeksha

SubratDas5 · ‎11-29-2019

Hi Apeksha,

You can use the block until done tool to initiate execution of the downstream tools only after execution of the EDA and ETL are completed.

tty · ‎08-10-2020

Does the option Cache and run workflow work when the data source comes from a Teradata database? It is greyed out when I right-clicked on the output anchors.

Thanks.

Alteryx Designer Desktop Discussions

Performance optimisation