Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Server Discussions

Find answers, ask questions, and share expertise about Alteryx Server.
SOLVED

What is the maximum size for Exported Workflows?

andrewdatakim
12 - Quasar
12 - Quasar

I recently had this error message which came because I left a data file in the workflow by mistake, but it made me think what is the maximum size for an exported workflow? Can anyone answer this. 

 

As I start chaining workflows I want to make sure they can carry as an entire process if I need to.

 

Thanks,

AndreExport Workflow Error.JPG

6 REPLIES 6
DanC
Moderator
Moderator

Hi @andrewdatakim,

 

Thanks for your question. There is a size limit and it is tied to the ZIP program utilized in the Export Workflow function. A packaged workflow can be produced with up to 65,535 items and a total archive size of 4.2 GBs.

 

Thanks!

 

Dan Chapman
Program Manager, Customer Support
New to the community? Get started here.
asma_rez
8 - Asteroid

Hello,

 

I am looking for how to scale an on prem architecture when I have 200 workflows per day?

 

Sizing are always by users However I have 5 users who should execute 200 workflows per day

Is there any documentations or tips to size alteryx server by workloads?

 

Thanks

Asma

fharper
12 - Quasar

Not sure but maybe Alteryx will provide confirmation, when exporting it may include metadata.  some flows get bloated with meta-data and thus may get significantly larger for this reason, 100mb with metadata vs 5 or 6 kb in the actual flow "code".  Try doing a select all, delete and then paste back and save to see how big your flow is after vs before.  if it is notably smaller you had a lot of meta data generated by test runs and it is likely small enough to export at that point.  Same issue can cause sluggish editing of a flow as it is working through all the metadata built up with each run.

 

Wish there was an easy purge function but there is none as far as I know, other than the select all/delete/paste back trick.

 

I have not seen anyone produce a flow that is actually more than a few MBs once meta data is "removed".

 

 

fharper
12 - Quasar

to

A more useful utilization rate is concurrent flows vs flows per day, number of users doesn't really matter and 200 flows a day is nothing if spread over 24 hours.  You also need to quantify complexity of flows, I have flows that run in hours doing time consuming but not memory intensive work, other flows doing memory intensive work but maybe runs in 20 minutes, others may be CPU and memory intensive doing data modeling, major performance hogs....

 

so it is very subjective to they type of flows and frequency and concurrency. The server configuration can handle more or less depending on how much RAM and Disk and how many cpu's your server has. 

 

I have emulated server function with great success with a standalone designer with API/Scheduler feature added and i have also used the full Server product,  200 flows a day would be light work for both generally.  data modeling is normally manually run on a single system so the big question is concurrency, normally a system running 10 to 15 flows at a time (concurrent) is fine before things stack up.  this is empirical with our config of 24 GB ram, 4 CPU and 400 GB dedicated hard drive.  be aware that each flow can generate 10 to 20GB of temp files while running and the default is to use local drive space, you can override to NAS.  so 10 flows, that use a lot of temp space, generating 20GB temp files needs 200GB drive space over and above any system overhead.  often IT sets up a VM or dedicated system with minimal drive space which causes issues people often don't recognize.

 

you should do some performance profiling of the flows you want to deploy and based on the peak requirements of the "stressed" flow load you can config your Microsoft Server's memory, drive and cpu. The server product offers more features for distributing processing but  single server running a standalone designer with Scheduler option can easily handle what you propose in my experience so a server product will handle it at least as well.  Just remember the windows server sizing is probably more important for performance than the Alteryx software.

 

If you grow a lot you can add more servers and Alteryx Server can handle setting them up as workers

fharper
12 - Quasar

To Asma_rez....or anyone on sizing

A more useful utilization rate is concurrent flows vs flows per day, number of users doesn't really matter and 200 flows a day is nothing if spread over 24 hours.  You also need to quantify complexity of flows, I have flows that run in hours doing time consuming but not memory intensive work, other flows doing memory intensive work but maybe runs in 20 minutes, others may be CPU and memory intensive doing data modeling, major performance hogs....

 

so it is very subjective to they type of flows and frequency and concurrency. The server configuration can handle more or less depending on how much RAM and Disk and how many cpu's your server has. 

 

I have emulated server function with great success with a standalone designer with API/Scheduler feature added and i have also used the full Server product,  200 flows a day would be light work for both generally.  data modeling is normally manually run on a single system so the big question is concurrency, normally a system running 10 to 15 flows at a time (concurrent) is fine before things stack up.  this is empirical with our config of 24 GB ram, 4 CPU and 400 GB dedicated hard drive.  be aware that each flow can generate 10 to 20GB of temp files while running and the default is to use local drive space, you can override to NAS.  so 10 flows, that use a lot of temp space, generating 20GB temp files needs 200GB drive space over and above any system overhead.  often IT sets up a VM or dedicated system with minimal drive space which causes issues people often don't recognize.

 

you should do some performance profiling of the flows you want to deploy and based on the peak requirements of the "stressed" flow load you can config your Microsoft Server's memory, drive and cpu. The server product offers more features for distributing processing but  single server running a standalone designer with Scheduler option can easily handle what you propose in my experience so a server product will handle it at least as well.  Just remember the windows server sizing is probably more important for performance than the Alteryx software.

 

If you grow a lot you can add more servers and Alteryx Server can handle setting them up as workers

sidgrowup7
5 - Atom

Hi @DanC ,

I have a workflow with multiple input files in .yxdb format, the combined file size is almost 20 GB.

I am trying to package the workflow but getting same error mentioned above, is there any way to handle this scenerio.

Kindly help.

Thanks in advance.