Alteryx Designer Desktop Discussions

EduardZ · ‎09-01-2021

Hello,

I am unable to open using directly the Alteryx files that are doubly archived:

1. first step: I receive a file named like: Export_ZJENKINS_SRF015_TACTICAL_SOL_20210721_063819.tgz.vfe.zip

2. step two: I need to unzip the file and then I will get: Export_ZJENKINS_SRF015_TACTICAL_SOL_20210721_063819.tgz

3. third step: I need to open the file from step 2 and obtain the following *.csv file: PRFGPRF1.ZJENKINS_SRF015_TACTICAL_SOL_20210721_063824.dmp.csv

I want to have a workflow that will be able to unzip the file, then get the .*csv inside the *.tgz file without any manual process.

Be careful, the name of the *csv file inside the *.tgz is not known and needs to be identified automatically through the workflow.

The beer will be served in Bucharest, Romania 🙂

Maskell_Rascal · ‎09-01-2021

Hi @EduardZ

Is there anyway you can post a sample file for us to work with? I have a method of extracting data from a zip file using Python that I think could be applied here, but would need to configure it to your needs.

Thanks!

Phil

messi007 · ‎09-01-2021

@EduardZ,

Please see attached how you can do it 🙂

You have to update the source and destination column in the input file.

You have to install 7zip as well.

Hope this helps!

Regards,

Maskell_Rascal · ‎09-02-2021

Hey @EduardZ

So I spent way more time on this than I should have, but I liked the challenge! 🙂

Here is a method that should work for you that extracts the contents of the zip file to your temp files and then reads in the .tgz data as a CSV file.

Step1: Directory Tool pointed to location of zip file

Step2: Create some Python friendly path names

You can see here that I'm hardcoding my local temp file as the location to extract the TGZ file and also creating a field with that path and file name. Since the name of the TGZ file is the same as the ZIP, I can just do a replace command to change it for later when Python reads it.

Step3: Keep only the fields needed for Python tool (this step isn't really that important, but it helps me keep my sanity when looking at the workflow)

Step4: Let Python do all the work!

Final output and workflow:

I like this method, since it uses your local temp file as the extraction point. This means that you're not creating files/folders on your local drive that will have to be deleted later. This method also doesn't require you to know the name of the CSV file inside the TGZ file.

One interesting thing I want to callout. You'll notice that the first column header is the name of the CSV file. This is caused because the file was tarballed first before being gzipped. Pandas is just trying to treat the file as a gzipped csv, so this is to be expected. The remaining column names come through just fine.

Attached is a copy of the workflow I built. You should be able to update the Directory tool and the Formula tool to paths on your computer and run it right away.

If this solves your problem please mark answer as correct, if not let me know!

Cheers!

Phil

Alteryx Designer Desktop Discussions

Read .csv file in a double archive [it's a .tgz file in a *.zip file]

Re: Date Time Function - Prioritization Base on Du...

Re: Running multiple alteryx workflows within alte...

Re: Selecting the columns coming after a specific ...

Re: Regex(?) formula to remove values matching the...

Re: Replacing Column Headings

Alteryx Designer Desktop Discussions

Read *.csv file in a double archive [it's a *.tgz file in a *.zip file]

Re: Date Time Function - Prioritization Base on Du...

Re: Running multiple alteryx workflows within alte...

Re: Selecting the columns coming after a specific ...

Re: Regex(?) formula to remove values matching the...

Re: Replacing Column Headings

Read .csv file in a double archive [it's a .tgz file in a *.zip file]