I am downloading a big batch of zipped files though an API. Unfortunately almost always a few of the zips are corrupted and can't be open. I have a macro (attached) that reads the csv's inside those zipped files. The macro gives an error for each file it wasn't able to open.
What I want to do is to create a similar macro that would check if a file is corrupted or not, but without giving me an error.
On a related note, checking the "Treat Read Errors as Warnings" in the Input Data tool doesn't suppress the read errors.
Solved! Go to Solution.
Hello @FilipR :
Have you explored using the Dynamic Input Tool? This tool requires that all file schemas are the same as the configured template file. If a file path passed into the tool has a different set of columns, it will skip the file with a warning. In the case of a corrupted file, this may have the intended outcome. I've attached an example with 3 files. Files 1 and 3 are good, but file 2 is corrupted. The example workflow will read files 1 and 3 but skip 2 with a warning message.
Hi @MatthewO.
Unfortunately I can't open the workflow you attached (we use an older version of Alteryx at my company). I tried figuring it out on my own, but I can't make it work with zip files (I get an "Unable to open archive" error).
In the meantime, I figured out a solution in Python. The input is a list with a path to the zip files called [DownloadPath]. The Python tool checks if each file is a valid zip file and gives 1 or 0 answer in a new [Valid] column.
#################################
# List all non-standard packages to be imported by your
# script here (only missing packages will be installed)
from ayx import Package
#Package.installPackages(['pandas','numpy'])
#################################
from ayx import Alteryx
from zipfile import ZipFile
#################################
# read in data from input anchor as a pandas dataframe
# (after running the workflow)
df = Alteryx.read("#1")
#################################
# create a new column
df['Valid'] = 0
#################################
# loop through the rows and validate the zip files
for ind in df.index:
file = df['DownloadPath'][ind].replace('\\','/')
try:
test = ZipFile(file)
df['Valid'][ind] = 1
except:
df['Valid'][ind] = 0
#################################
# and then send it to one of the output anchors
Alteryx.write(df, 1)
@FilipR glad to hear you found a solution. For future reference, you can adjust the version of a workflow file to open it in an older version of Alteryx. The following article explains how this can be done: https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Adjusting-Alteryx-Files-for-Differe...
@MatthewO Thanks for this idea. I was trying to do a batch macro and they failed without continuing. The Dynamic input worked great.