We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

"Insufficient Disk Space" error with .gz files that have large file sizes

Scott_Snowman
10 - Fireball

Good morning all -- I have a workflow that is designed to do the following:

 

  • Use a Directory Tool to find the most recently delivered .gz file to a network drive
  • Use a Dynamic Input tool to extract the data from the .csv zipped inside the .gz tool
  • Perform some transformations and load the results into a data warehouse.

This workflow has been working flawlessly until recently when the .gz files started containing files over 10GB or so in size (unzipped).

 

I get an error on the Dynamic Input tool: "Insufficient Disk space - extracting this file will result in less than 10% (or less than 10 GB) of free disk space in your Alteryx temp folder."

 

This appears to be a different type of error than most of the other threads and Alteryx team member posts regarding Temp memory management and I'm at a loss. I can extract the .csv from the .gz file and run the workflow using the .csv as an input instead, but I'd prefer to continue using the .gz input if possible.

 

The error appears immediately as the workflow is executed so it seems there's an immediate comparison between the assumed size of the data and my available disk space. But since I can extract the file manually, and use it as input, it almost seems as though the assumed size of the data is wrong.

 

From what I understand, there isn't a Windows-managed "Temp folder size" so I'm not sure what the reference to <10GB or <10% could even really mean, if not overall disk space.

 

Any ideas on how I can overcome this, or else "adjust" the amount of data allowed to reside in the Temp folder?

 

Some possibly helpful detail:

 

  • The option in the Dynamic Input file template to extract files >2GB is checked
  • There are a few blocking tools in the workflow (Uniques and Autofield) but even running the data input section by itself without any other tools on the canvas triggers the error
9 REPLIES 9
bbak91
Alteryx
Alteryx

Does a regular Input Data tool work if you point it at the CSV within the gz file? If it does I wonder if you could make a macro that just updates the config of the Input Data tool to effectively make a more flexible dynamic input tool equivalent if that is the issue. 

Scott_Snowman
10 - Fireball

Hi @bbak91 -- yes, the workflow works if I extract the .csv and then point the workflow at the extracted file. I am trying to avoid that step, though, so that I can get back to running the workflow on a schedule without human interaction (or building some other automated process to extract the file before the workflow runs).

 

It seems that the error has to be incorrect in some fashion. Since I can extract and work with the base .csv, I clearly have enough room on my hard drive to work with the entirety of the data. I must be missing something about how Alteryx expects to manage size of data within the Temp folder.

bbak91
Alteryx
Alteryx

Right, I'm suggesting that rather than extracting the gz file, you select the gz file directly within Alteryx. Can you give this a try and let me know if it reads that in correctly? 

 

BrandonB_0-1622831114716.png

 

BrandonB_1-1622831360315.png

 

bbak91
Alteryx
Alteryx

My thought would then be leveraging a control parameter and action tool to wrap it into a macro which is then leveraged in another workflow taking inputs from a directory tool and reading the gz file dynamically

Scott_Snowman
10 - Fireball

Ah, got it -- I misunderstood. But unfortunately the direct input provides the same error. In fact, the error now appears once the Input tool is configured, not even needing the workflow to run.

 

ScottS28_0-1622832424840.png

 

So it looks like this is something specific to very large files contained in .gz files -- if the workflow doesn't even need to be run to generate this error, it seems like there is some pre-check that is happening irrespective of actual temp file generation.

bbak91
Alteryx
Alteryx

Can you look specifically at the temp folder location that you have specified in Alteryx? I am curious what your disk space looks like and if there are any big files existing in your temp folder.

 

These articles may be helpful: 

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Error-Temp-Drive-is-Getting-Full/ta... 

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Temporary-File-Creation/ta-p/77726 

https://help.alteryx.com/current/designer/alteryx-and-temporary-files 

Scott_Snowman
10 - Fireball

I attempted to re-run the workflow again, and before running verified that the Temp folder did not have any large files in it. Total size on disk only 32 MB.

 

ScottS28_0-1622928837919.png

 

The .gz file is not stored locally, it is on a network drive, and my C:\ drive had ~50 GB free before I attempted the workflow. I immediately got the same error. The Alteryx workflow does not even attempt to unpack the .gz, citing drive space issues, but I can run the entirety of the workflow even if I unzip the container and keep the .csv on my C:\ drive.

 

This definitely appears to be a bug -- if the entirety of the data can move through the workflow unzipped, then it appears the Alteryx process is incorrectly assessing whether or not there's enough space to work with the extracted data and giving up before even starting.

 

If you have no other suggestions, I will open up a Support case.

bbak91
Alteryx
Alteryx

My guess is that this is a built in safety check given the high compression of gz files. In our documentation it says

 

  • Designer supports a 15 to 1 compression rate for the Gzip layer. Verify that you have 15 times the compression rate. For example, if your compressed Gzip file is 1GB, make sure that you have 15GB of space available on your hard drive.

https://help.alteryx.com/current/designer/gzip-file-support


I’m not aware of where to change the threshold although another file that has an uncompressed size larger than your disk space would pose problems. I would say your two best approaches would be:

 

1.) change your temp location to a hard drive with more disk space via the user settings

2.) possibly try leveraging a Python tool for your input. I believe the command should be something like 

f = gzip.open(filename, mode="rt")

after importing gzip in the beginning of your script. 

Scott_Snowman
10 - Fireball

Thanks @bbak91 ! Definitely not a bug then. I'll note that Data Sources section of the documentation as I didn't see it when researching.

 

I don't have access to another non-network drive so I will check out the Python input as a potential workaround.

 

I'll go ahead and mark this as closed, many thanks for the help!

Labels
Top Solution Authors