Good morning all -- I have a workflow that is designed to do the following:
This workflow has been working flawlessly until recently when the .gz files started containing files over 10GB or so in size (unzipped).
I get an error on the Dynamic Input tool: "Insufficient Disk space - extracting this file will result in less than 10% (or less than 10 GB) of free disk space in your Alteryx temp folder."
This appears to be a different type of error than most of the other threads and Alteryx team member posts regarding Temp memory management and I'm at a loss. I can extract the .csv from the .gz file and run the workflow using the .csv as an input instead, but I'd prefer to continue using the .gz input if possible.
The error appears immediately as the workflow is executed so it seems there's an immediate comparison between the assumed size of the data and my available disk space. But since I can extract the file manually, and use it as input, it almost seems as though the assumed size of the data is wrong.
From what I understand, there isn't a Windows-managed "Temp folder size" so I'm not sure what the reference to <10GB or <10% could even really mean, if not overall disk space.
Any ideas on how I can overcome this, or else "adjust" the amount of data allowed to reside in the Temp folder?
Some possibly helpful detail:
Solved! Go to Solution.
Does a regular Input Data tool work if you point it at the CSV within the gz file? If it does I wonder if you could make a macro that just updates the config of the Input Data tool to effectively make a more flexible dynamic input tool equivalent if that is the issue.
Hi @bbak91 -- yes, the workflow works if I extract the .csv and then point the workflow at the extracted file. I am trying to avoid that step, though, so that I can get back to running the workflow on a schedule without human interaction (or building some other automated process to extract the file before the workflow runs).
It seems that the error has to be incorrect in some fashion. Since I can extract and work with the base .csv, I clearly have enough room on my hard drive to work with the entirety of the data. I must be missing something about how Alteryx expects to manage size of data within the Temp folder.
Right, I'm suggesting that rather than extracting the gz file, you select the gz file directly within Alteryx. Can you give this a try and let me know if it reads that in correctly?
My thought would then be leveraging a control parameter and action tool to wrap it into a macro which is then leveraged in another workflow taking inputs from a directory tool and reading the gz file dynamically
Ah, got it -- I misunderstood. But unfortunately the direct input provides the same error. In fact, the error now appears once the Input tool is configured, not even needing the workflow to run.
So it looks like this is something specific to very large files contained in .gz files -- if the workflow doesn't even need to be run to generate this error, it seems like there is some pre-check that is happening irrespective of actual temp file generation.
Can you look specifically at the temp folder location that you have specified in Alteryx? I am curious what your disk space looks like and if there are any big files existing in your temp folder.
These articles may be helpful:
https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Temporary-File-Creation/ta-p/77726
https://help.alteryx.com/current/designer/alteryx-and-temporary-files
I attempted to re-run the workflow again, and before running verified that the Temp folder did not have any large files in it. Total size on disk only 32 MB.
The .gz file is not stored locally, it is on a network drive, and my C:\ drive had ~50 GB free before I attempted the workflow. I immediately got the same error. The Alteryx workflow does not even attempt to unpack the .gz, citing drive space issues, but I can run the entirety of the workflow even if I unzip the container and keep the .csv on my C:\ drive.
This definitely appears to be a bug -- if the entirety of the data can move through the workflow unzipped, then it appears the Alteryx process is incorrectly assessing whether or not there's enough space to work with the extracted data and giving up before even starting.
If you have no other suggestions, I will open up a Support case.
My guess is that this is a built in safety check given the high compression of gz files. In our documentation it says
https://help.alteryx.com/current/designer/gzip-file-support
I’m not aware of where to change the threshold although another file that has an uncompressed size larger than your disk space would pose problems. I would say your two best approaches would be:
1.) change your temp location to a hard drive with more disk space via the user settings
2.) possibly try leveraging a Python tool for your input. I believe the command should be something like
f = gzip.open(filename, mode="rt")
after importing gzip in the beginning of your script.
Thanks @bbak91 ! Definitely not a bug then. I'll note that Data Sources section of the documentation as I didn't see it when researching.
I don't have access to another non-network drive so I will check out the Python input as a potential workaround.
I'll go ahead and mark this as closed, many thanks for the help!
User | Count |
---|---|
63 | |
28 | |
23 | |
22 | |
22 |