Hello,
I am unable to open using directly the Alteryx files that are doubly archived:
1. first step: I receive a file named like: Export_ZJENKINS_SRF015_TACTICAL_SOL_20210721_063819.tgz.vfe.zip
2. step two: I need to unzip the file and then I will get: Export_ZJENKINS_SRF015_TACTICAL_SOL_20210721_063819.tgz
3. third step: I need to open the file from step 2 and obtain the following *.csv file: PRFGPRF1.ZJENKINS_SRF015_TACTICAL_SOL_20210721_063824.dmp.csv
I want to have a workflow that will be able to unzip the file, then get the .*csv inside the *.tgz file without any manual process.
Be careful, the name of the *csv file inside the *.tgz is not known and needs to be identified automatically through the workflow.
The beer will be served in Bucharest, Romania 🙂
Hi @EduardZ
Is there anyway you can post a sample file for us to work with? I have a method of extracting data from a zip file using Python that I think could be applied here, but would need to configure it to your needs.
Thanks!
Phil
Please see attached how you can do it 🙂
You have to update the source and destination column in the input file.
You have to install 7zip as well.
Hope this helps!
Regards,
Hey @EduardZ
So I spent way more time on this than I should have, but I liked the challenge! 🙂
Here is a method that should work for you that extracts the contents of the zip file to your temp files and then reads in the .tgz data as a CSV file.
Step1: Directory Tool pointed to location of zip file
Step2: Create some Python friendly path names
You can see here that I'm hardcoding my local temp file as the location to extract the TGZ file and also creating a field with that path and file name. Since the name of the TGZ file is the same as the ZIP, I can just do a replace command to change it for later when Python reads it.
Step3: Keep only the fields needed for Python tool (this step isn't really that important, but it helps me keep my sanity when looking at the workflow)
Step4: Let Python do all the work!
Final output and workflow:
I like this method, since it uses your local temp file as the extraction point. This means that you're not creating files/folders on your local drive that will have to be deleted later. This method also doesn't require you to know the name of the CSV file inside the TGZ file.
One interesting thing I want to callout. You'll notice that the first column header is the name of the CSV file. This is caused because the file was tarballed first before being gzipped. Pandas is just trying to treat the file as a gzipped csv, so this is to be expected. The remaining column names come through just fine.
Attached is a copy of the workflow I built. You should be able to update the Directory tool and the Formula tool to paths on your computer and run it right away.
If this solves your problem please mark answer as correct, if not let me know!
Cheers!
Phil