I am running into a challenge on a project and am hoping that the community can help. I am receiving some .Zip files that contain flat files that need to be read as fixed width.
Thinking of two approaches but not sure which will work. Anyone else faced this challenge?
1) Extract the files from the Zip. Extract the files via RunCmd. I see this thread here that may be of some use (https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Can-Alteryx-unzip-a-file-as-part-of-th...) but am not quite sure how to navigate that code. Once extracted I can import as .Flat using a layout workbook created in Excel. I' heading down this path now since the import part is straightforward once I learn how to extract from the ZIp.
2) Leave the files in the Zip, but limited file types to choose from. I can Import as Csv without a delimiter but I don't see how to utilize the layout workbook in this method. Seems like I would be stuck parsing via another method like regex or substring for each field.
Solved! Go to Solution.
Hello,
Alteryx can read files inside a zip, so your second option would work.
If possible, could you share an example of the files you're going to be using? This way we can figure out if you can just go with substrings (As they have a fixed-width) or Regex.
@nmosley I had this same situation 5 years ago (to the day!) and submitted an idea for the zip files to be able to read more data types that are already supported in the input tool, including flat files. Alteryx never looked into the idea.
For me, I found it easier to write some python to work with the zip files. I've tweaked my code below a little bit, but the goal is just to download a zip from the web and then extract all the flat files from the zip. I've found it easier to use python than the run command tool (and I've done both). Hopefully this points you in the right direction in case you wanted to try this route. After the code, you can feed the file name (and or path) coming out of the python tool into a dynamic input.
import urllib
zipin = 'https://website/file.zip'
zipout = '//server/folder/file.zip'
urllib.request.urlretrieve(zipin,zipout)
import zipfile
import fnmatch
import pandas
from ayx import Alteryx
import os.path
zip=zipfile.ZipFile(zipout)
out = zip.namelist()
list = fnmatch.filter(out,"*flat")
Alteryx.write(pandas.DataFrame(list), 1)
zip.extract(''.join(list),os.path.dirname(zipout))
Hope that helps!
Nice solution @patrick_digan, theres always a way with Alteryx!
Thanks @patrick_digan! I wasn't sure if I was explaining the situation effectively but you seem to have understood perfectly. This may be the final straw that forces me to learn python. I tried the following:
One caveat to this project that differs from your sample code. I am starting with the zip files already located in a network folder. Another process that I have no control over is dropping them off. I have performed a directory sweep and can figure out which zip files have already been extracted and am connecting that stream to the input anchor on the python tool. However, it errors out saying "series has no attribute seek."
@nmosley Great to hear! You just need to modify the source and destination by adding a [0] at the end to specify the row number. That way python will treat it as a string.
from ayx import Alteryx
import pandas
import zipfile
from os.path import isdir
from os.path import dirname
df = Alteryx.read("#1")
source = df["InputPath"][0]
destination = df["OutputPath"][0]
zip = zipfile.ZipFile(source)
zip.extractall(destination)
If you wanted your python to handle more than one file, you could do a simple for loop (full disclosure that I'm no Python whiz and this may not be the most efficient way to do this):
from ayx import Alteryx
import pandas
import zipfile
from os.path import isdir
from os.path import dirname
df = Alteryx.read("#1")
for i in range(len(df)):
source = df["InputPath"][i]
destination = df["OutputPath"][i]
zip = zipfile.ZipFile(source)
zip.extractall(destination)
Posting the final python code in the event that someone else lands here looking for an answer.
I'm trying to implement that code to my workflow but I'm getting some errors.
Could you share sample workflow which is using this solution?
Or explain more what format the fields taken from Alteryx.read("#1") should have? (InputhPath, OutputPath)
I am performing a directory sweep to get the full path of the new .ZIP files.
From there I use ReplaceChar([InputPath], '\', '/') to flip the slash direction on InputPath only.
OutputPath is left in the normal '\' format though I am not sure why.
Both fields are defined as V_WString 32767.
What error are you receiving? If this isn't enough I can attempt to convert the workflow to run on sample data.