The YXDB data is only 3.8GB, however when I load it into my workflow (the input tool is the first tool in the workflow), it became 27GB. Why is this happening?
The workflow then uses a Summarize tool to group by only 3 columns from the input data.
I also have 90 formula tools, and each formula tool have their own specific stuff. Each formula tool adds 2 columns (I can combine them all into 1 formula tool but these separated tools make it easier to pinpoint the specific tool if I need to update their formula). It took 19 hours for the workflow to run. The input data is already grouped as small as possible.
Is there a way to speed up the workflow?
YXDBs do a sort of compression to make the file smaller on your Desktop - here are some references: Alteryx Database File Format & Compression rate of a yxdb file - Alteryx Community
If you want to confirm which tools are taking the longest, use Performance Profiling in your Runetime settings. Yes you'll have to run it again but you'll see the tool that takes longest: Performance Profiling with AMP Engine
Also is AMP engine turned on?
oh so I guess the original data itself is 27GB.
the AMP engine is on but I'll have to try that Performance Profiling later. But I guess there's no way to speed it up? Since Performance Profiling only shows the run time of each tool
It's hard to make other suggestions at this time - the profiling will give you an idea of the tools that could help. The only other option then is to increase your memory usage on the computer itself