Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Read *.csv file in a double archive [it's a *.tgz file in a *.zip file]

EduardZ
5 - Atom

Hello,

 

I am unable to open using directly the Alteryx files that are doubly archived:

1. first step: I receive a file named like: Export_ZJENKINS_SRF015_TACTICAL_SOL_20210721_063819.tgz.vfe.zip

2. step two: I need to unzip the file and then I will get: Export_ZJENKINS_SRF015_TACTICAL_SOL_20210721_063819.tgz

3. third step: I need to open the file from step 2 and obtain the following *.csv  file: PRFGPRF1.ZJENKINS_SRF015_TACTICAL_SOL_20210721_063824.dmp.csv

 

I want to have a workflow that will be able to unzip the file, then get the .*csv inside the *.tgz file without any manual process.

Be careful, the name of the *csv file inside the *.tgz is not known and needs to be identified automatically through the workflow.

 

The beer will be served in Bucharest, Romania 🙂

3 REPLIES 3
Maskell_Rascal
13 - Pulsar

Hi @EduardZ 

 

Is there anyway you can post a sample file for us to work with? I have a method of extracting data from a zip file using Python that I think could be applied here, but would need to configure it to your needs. 

 

Thanks!

Phil

messi007
15 - Aurora
15 - Aurora

@EduardZ,

 

Please see attached how you can do it 🙂

You have to update the source and destination column in the input file.

You have to install 7zip as well.

 

messi007_0-1630515395524.png

 

Hope this helps!

Regards,

Maskell_Rascal
13 - Pulsar

Hey @EduardZ 

 

So I spent way more time on this than I should have, but I liked the challenge! 🙂

 

Here is a method that should work for you that extracts the contents of the zip file to your temp files and then reads in the .tgz data as a CSV file.

 

Step1: Directory Tool pointed to location of zip file

Maskell_Rascal_0-1630597300643.png

Maskell_Rascal_1-1630597326428.png

 

Step2: Create some Python friendly path names

Maskell_Rascal_2-1630597395125.png

You can see here that I'm hardcoding my local temp file as the location to extract the TGZ file and also creating a field with that path and file name. Since the name of the TGZ file is the same as the ZIP, I can just do a replace command to change it for later when Python reads it. 

 

Step3: Keep only the fields needed for Python tool (this step isn't really that important, but it helps me keep my sanity when looking at the workflow)

Maskell_Rascal_3-1630597596717.png

 

Step4: Let Python do all the work!

Maskell_Rascal_4-1630597682219.png

 

Final output and workflow:

Maskell_Rascal_5-1630597770319.png

 

I like this method, since it uses your local temp file as the extraction point. This means that you're not creating files/folders on your local drive that will have to be deleted later. This method also doesn't require you to know the name of the CSV file inside the TGZ file. 

 

One interesting thing I want to callout. You'll notice that the first column header is the name of the CSV file. This is caused because the file was tarballed first before being gzipped. Pandas is just trying to treat the file as a gzipped csv, so this is to be expected. The remaining column names come through just fine. 

 

Attached is a copy of the workflow I built. You should be able to update the Directory tool and the Formula tool to paths on your computer and run it right away. 

 

If this solves your problem please mark answer as correct, if not let me know!

 

Cheers!

Phil

Labels