Hello,
I am trying to download zip file and output them into a folder. I applied workflow called DownloadAndExtractZips(solution in one of other community post on how to download zip file) into my workflow but the process have an error. It seems like my zip.txt file (which includes all the name of the files inside the zip folder) is empty. The workflow is not fetching the names properly. The zip file is password protected so I believe this may have caused the issue. What are the commands I need to add to be able to get the file names?
Thank you.
Solved! Go to Solution.
I suspect the reason the original workflow wasn't working for you was because your CSV file field name is "Output file". In the Python code I was reading in the entire input but then defining which field I was using.
@Maskell_Rascal Hello, the reason why I am getting the previous error was because my decoded zip file is not output properly. After correcting the file, making sure that everything is correct before the python step, I am getting this error "BadZipFile" (not a zip file error). I have double checked that the file is a zip file by going to the portal and download the zip file manually. Do you know how to solve this?
To summarize my process, I first download the zip file from the db using API. The zip file encoded in base64 format so I had to decode it and output it to a temp file called temp.tmp. Then I have a second workflow to convert the decoded to zip file which have 1 input (directory where the temp file stored). The workflow takes the input and rename the column to "File" and then input it to the python tool.
Thank you.
2nd workflow:
Hi @MinhTa - I'm not sure what is happening. The only thing I know for sure is that the path/file you are pointing to that is feeding into the Python code is not a zip file. I'm guessing that somewhere between your decode process and saving as a tmp file it is being corrupted.
Did you try the workflow I provided that has a base64 decode built into the python code?
Hello @Maskell_Rascal Thank you for your reply. I just tried the workflow implemented base64 you provided. However, I got the following error:
SUCCESS: reading input data "#1"
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) c:\users\ata\appdata\local\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2888 try: -> 2889 return self._engine.get_loc(casted_key) 2890 except KeyError as err: pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'FileDecoded' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) <ipython-input-1-2e915d41677b> in <module> 6 filepath = Alteryx.read("#1") 7 file = filepath['File'].iloc[0] ----> 8 file2 = filepath['FileDecoded'].iloc[0] 9 10 with open(file, 'rb') as file_input, open(file2, 'wb') as file_output: c:\users\ata\appdata\local\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\frame.py in __getitem__(self, key) 2897 if self.columns.nlevels > 1: 2898 return self._getitem_multilevel(key) -> 2899 indexer = self.columns.get_loc(key) 2900 if is_integer(indexer): 2901 indexer = [indexer] c:\users\ata\appdata\local\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2889 return self._engine.get_loc(casted_key) 2890 except KeyError as err: -> 2891 raise KeyError(key) from err 2892 2893 if tolerance is not None: KeyError: 'FileDecoded'
I wonder if I have any issue with how I save the encoded zip file. Here are the steps I took:
After I download zip file from db, the response I get is:
The resource_content is the zip file encoded in base64. I extracted the resource content, only keep the encoded part and saved it in a temp file with directory saved in output.csv. Before I had base64 decoder after extracting encoded base64 from resource content but I have removed it to test the python base64 decoder. This is the configuration of the output tool to save the temp file.
Output.csv is then used as input to second work flow. (contains directory to temp file)
Applied python code with base64 decoder implement (pretty much copy and paste the code you suggested earlier)
I would try using the workflow I provided to download the file directly and don't bother with the decoding/zip workflows you are using. The latest one I provided is designed to pull down the file, decode it, and unzip the contents.
@Maskell_Rascal I just tried as you recommended, no more extract resource_content, just directly save the response to temp file. However, I got the same error message.
SUCCESS: reading input data "#1"
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) c:\users\ata\appdata\local\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2888 try: -> 2889 return self._engine.get_loc(casted_key) 2890 except KeyError as err: pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'FileDecoded' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) <ipython-input-1-2e915d41677b> in <module> 6 filepath = Alteryx.read("#1") 7 file = filepath['File'].iloc[0] ----> 8 file2 = filepath['FileDecoded'].iloc[0] 9 10 with open(file, 'rb') as file_input, open(file2, 'wb') as file_output: c:\users\ata\appdata\local\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\frame.py in __getitem__(self, key) 2897 if self.columns.nlevels > 1: 2898 return self._getitem_multilevel(key) -> 2899 indexer = self.columns.get_loc(key) 2900 if is_integer(indexer): 2901 indexer = [indexer] c:\users\ata\appdata\local\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2889 return self._engine.get_loc(casted_key) 2890 except KeyError as err: -> 2891 raise KeyError(key) from err 2892 2893 if tolerance is not None: KeyError: 'FileDecoded'
You're missing the formula tool after the select tool. That's where we create the field "FileDecoded". This is the traceback error you are receiving. Throw a formula tool in right before the python tool configured like the below.
Hello @Maskell_Rascal Thank you every much with your solution. I got the following error when converting to zip file:
RuntimeError: File 'KBS_Report-apac-2021-11-04.csv' is encrypted, password required for extraction
The file is password protected and I do not wish to put the password in Alteryx just to extract the file. Is there a way to just get the file in zip format without extracting it? If that option is not available and we must extract the file to get the zip then how can I put in the password for the zip?
Thank you every much.