Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Combine multiple GZIP csv files

apqueen
7 - Meteor

Having what seems like a hard time with what seems like a simple issue – combining multiple text files of the same structure into a single file (no headers).  

The twist is these are GZIP csv files (csv.gz) which is leading to issues.  I’ve tried the two main approaches from the community but not having luck with either one.  Below are the details of the two approaches thus far. I can also combine individual file inputs with a union, but this approach is not scalable. 

Simpler is better and I would prefer to do without macros or coding, but any assistance is appreciated.

 

Input Data tool with wildcard*

Input tool works fine with a single gzip file, but inserting a wildcard in the filename to pull all files from a single folder is not working.  I assume this is due to the gz filetype somehow but don’t know if there is a workaround.  For a single file it’s easy to select CSV filetype from the ‘file in archive’ but with a wildcard in the filename this setting will not 'stick', for lack of a better term

apqueen_0-1600972846844.png

 

 

Directory + Dynamic Input tool

This method works to bring data in but it appears to be duplicating the ‘input data source template’, which is worse.  Hopefully this is can be solved with a different configuration of the settings. 

apqueen_1-1600972846847.png

 

 

Workflow is attached but not able to upload source data due to size.

10 REPLIES 10
David-Carnes
12 - Quasar

I just did something like this lately. Change the Dynamic Input to use the Action "Change Entire File Path."

 

4 apqueen.png

 

 

Good luck!

vizAlter
12 - Quasar

Hi @apqueen — I just did a testing that I kept all the .csv.gz files into a .zip folder, then imported all the files from that .zip like below:

 

vizAlter_0-1600979710256.png

 

 

Please mark it "Solved" or "Solved" with a Like if it resolved your query. This will help other users find the same answer/resolution.  Thank you.

apqueen
7 - Meteor

Thanks @David-Carnes.  I updated to 'change entire file path' this but am unfortunately now getting the error below.  Not sure why the first file (that does indeed exist) in the folder cannot be used as a template as it appears all my settings are correct.  When selecting the ‘file in archive’ that the error appears to be referring to my only relevant option is csv (screenshot attached)

 

Error: Dynamic Input (1): No file specified in zip archive: "D:\startapp mobile data\20200202\location_TX_0.csv.gz". Choose a file to read.

 

 

apqueen
7 - Meteor

Thanks @vizAlter .  I tried zipping the folder with 14 csv.gz files and then unzipping per your example.  The resulting file should have ~42 million rows and 5 columns but I am getting about 670,000 rows in two columns and the data appears to still be compressed.  Screenshot and workflow attached.

 

apqueen_0-1601062522885.png

 

EDIT - after giving this some thought I created three small sample data files which I have attached.  These are CSV and not GZIP as I don't have the means to create these, but are tab delimited (\t) like the original and do not have column names.  Hopefully this helps others to recreate the error if they have the associated compression software

David-Carnes
12 - Quasar
vizAlter
12 - Quasar

@apqueen — When I tested, I got full data in readable format (as mentioned in my previous post here).

 

Now, in current situation you may want to do a few more more tests...

(1) Just try to import your one of the files (actual GZIP extension), check if data is accurate or not

(2) Zip one of the files, import that Zip, audit the imported data

(If individual zip file extension gives you correct result then try to follow @David-Carnes 's approach)

(3) Did you also check your GZIP data with any other software?

 

vizAlter
12 - Quasar

@apqueen — Check this too... I did not find any issue:

 

vizAlter_0-1601064963348.png

 

 

apqueen
7 - Meteor

@vizAlter - still issues, maybe this is a setting on my machine. I zipped a single GZIP file and tried to read via the same approach but still no luck.  Was not able to upload the raw .gz file due to posting limitations but attached the zipped .gz file below

 

apqueen_0-1601066443557.png

 

original file properties shown in the jpeg

 

I am looking into the macro option

 

 

apqueen
7 - Meteor

Holy s&*% batman got it to work with a macro.  This is amazing as 1) not much good with macros and 2) was really losing hope with the .gz filetype.  

I’ve documented key steps below in case there are others with minimal programming/CS backgrounds trying to do the same thing

 

  1.  Create macro as shown, using the ‘interface’ tool palette and the first file in the folder as an input with associated delimiters, filetype etc…  Also if you don’t have one already create a macro folder on your computer via “Options --> user settings --> edit user settings --> macros” and save the macro into this folder.   

apqueen_1-1601069697562.png

 

  1. Create workflow as shown below.  Directory tool must be coupled with a formula tool as shown to add the |||datafile.csv extension to the FullPath output for macro input.  Someone smarter than me can maybe comment on why

 

apqueen_2-1601069697566.png

 

 

  1.  Add the macro via right click insert --> macro --> ….

 

 

apqueen_3-1601069697570.png

 

And the final workflow looks like below.  Make sure to check your data by converting a small set of individual files and unioning together vs. macro output by comparing record count, summarize ouputs, etc… to make sure you are correctly combining the data and not getting partial records or a duplication of some sort

 

apqueen_4-1601069697572.png

 

yay!

 

Labels