Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Handling large files with little PC memory

JunePark
8 - Asteroid

Hi,

 

For example, let's say I have a file size of about 400GBs, while there are only about 20GBs of PC memory left.

To handle this, I've uploaded the file to Google Bigquery.

 

My question is, will Alteryx Designer be able to process this file through the workflow??
I'm not an IT expert, but my intuitive guess is that Alteryx Designer won't have enough temporary memory to save data for each tool.

 

It would be best to try running the workflow with a real example,

but I want to know this in advance before talking to my client.

 

P.S. IF I use Alteryx Server, will the problem above will be solved by making the operation cloud based?

(We're probably deploying Alteryx Server next year, and it would be informative to know in advance)

 

2 REPLIES 2
fmvizcaino
17 - Castor
17 - Castor

Hi @JunePark ,

 

Yes, Alteryx can definitely process files that big with no problems whatsoever, you only need to test it to check how long it takes and that is specific related to the workflow complexity of course.

 

Running that in a Alteryx designer locally in an average notebook, you can use the AMP engine, which is a multi-threaded processing engine, released in 2020.2 version which will run a lot faster than the regular engine. Find more about here: https://help.alteryx.com/current/designer/alteryx-engine-and-amp-main-differences

 

Related to the memory usage, AMP engine uses 25% of notebook's available memory to process a workflow and after consuming all of it, it starts to write temp files to process anything else.

 

Also, after processing the data for a specific tool, Alteryx moves forward and only leaves a sample of data showing in the results tab (1MB of data in memory by standard).

 

Tips:

  1. Only use browse tools when trully needed, since it shows 100% of the data (you can deactivate all browse tools at once in runtime tab of workflow's configuration: https://help.alteryx.com/current/designer/workflow-configuration
  2. Use the minimum amount of blocking tools as possible: For example: join, sort, summarize, cross-tab, transpose. These tools need to have all the data before passing the data to the next tool: You can find all the blocking tools here: https://community.alteryx.com/t5/Engine-Works/The-Periodic-Table-of-Alteryx-tools/ba-p/64120
  3. https://help.alteryx.com/current/designer/workflow-optimization

Another option for you is to use in-database tools, where you can use your database performance to run your workflows, this will improve a lot your process time from minutes to seconds (we are currently using redshift in some projects and it is breath taking

https://help.alteryx.com/current/designer/database-overview

 

Lastly, if you move to Alteryx server with a higher RAM, it will improve your process time but currently (2020.3) you can't select the workflows that will run with the AMP engine and which will not, you have only a global selection of engines and that is only suggested for some cases with Alteryx server (keep in mind that this is a current state but will possibly change in the next versions)

 

Hope this clarifies a bit.

Best,

Fernando Vizcaino

JunePark
8 - Asteroid

@fmvizcaino 

As a non IT Professional, there's a lot to learn.

Thanks for your detailed advice, I'll walk through them thoroughly.

I hope Alteryx supports In database tools for BigQuery as well in the future.

Hope you have a wonderful day!

Labels