Having what seems like a hard time with what seems like a simple issue – combining multiple text files of the same structure into a single file (no headers).
The twist is these are GZIP csv files (csv.gz) which is leading to issues. I’ve tried the two main approaches from the community but not having luck with either one. Below are the details of the two approaches thus far. I can also combine individual file inputs with a union, but this approach is not scalable.
Simpler is better and I would prefer to do without macros or coding, but any assistance is appreciated.
Input Data tool with wildcard*
Input tool works fine with a single gzip file, but inserting a wildcard in the filename to pull all files from a single folder is not working. I assume this is due to the gz filetype somehow but don’t know if there is a workaround. For a single file it’s easy to select CSV filetype from the ‘file in archive’ but with a wildcard in the filename this setting will not 'stick', for lack of a better term
Directory + Dynamic Input tool
This method works to bring data in but it appears to be duplicating the ‘input data source template’, which is worse. Hopefully this is can be solved with a different configuration of the settings.
Workflow is attached but not able to upload source data due to size.
Solved! Go to Solution.
I just did something like this lately. Change the Dynamic Input to use the Action "Change Entire File Path."
Good luck!
Hi @apqueen — I just did a testing that I kept all the .csv.gz files into a .zip folder, then imported all the files from that .zip like below:
Please mark it "Solved" or "Solved" with a Like if it resolved your query. This will help other users find the same answer/resolution. Thank you.
Thanks @David-Carnes. I updated to 'change entire file path' this but am unfortunately now getting the error below. Not sure why the first file (that does indeed exist) in the folder cannot be used as a template as it appears all my settings are correct. When selecting the ‘file in archive’ that the error appears to be referring to my only relevant option is csv (screenshot attached)
Error: Dynamic Input (1): No file specified in zip archive: "D:\startapp mobile data\20200202\location_TX_0.csv.gz". Choose a file to read.
Thanks @vizAlter . I tried zipping the folder with 14 csv.gz files and then unzipping per your example. The resulting file should have ~42 million rows and 5 columns but I am getting about 670,000 rows in two columns and the data appears to still be compressed. Screenshot and workflow attached.
EDIT - after giving this some thought I created three small sample data files which I have attached. These are CSV and not GZIP as I don't have the means to create these, but are tab delimited (\t) like the original and do not have column names. Hopefully this helps others to recreate the error if they have the associated compression software
This may be a bug. Here is a workaround:
@apqueen — When I tested, I got full data in readable format (as mentioned in my previous post here).
Now, in current situation you may want to do a few more more tests...
(1) Just try to import your one of the files (actual GZIP extension), check if data is accurate or not
(2) Zip one of the files, import that Zip, audit the imported data
(If individual zip file extension gives you correct result then try to follow @David-Carnes 's approach)
(3) Did you also check your GZIP data with any other software?
@vizAlter - still issues, maybe this is a setting on my machine. I zipped a single GZIP file and tried to read via the same approach but still no luck. Was not able to upload the raw .gz file due to posting limitations but attached the zipped .gz file below
original file properties shown in the jpeg
I am looking into the macro option
Holy s&*% batman got it to work with a macro. This is amazing as 1) not much good with macros and 2) was really losing hope with the .gz filetype.
I’ve documented key steps below in case there are others with minimal programming/CS backgrounds trying to do the same thing
And the final workflow looks like below. Make sure to check your data by converting a small set of individual files and unioning together vs. macro output by comparing record count, summarize ouputs, etc… to make sure you are correctly combining the data and not getting partial records or a duplication of some sort
yay!