This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Is there a way to split a file beyond a given size into multiple sub files. Eg. if the size of output file from the alteryx workflow is 1 GB then split the file into two files each with size less than 1 GB and send the final output as zipped file like filename_1.zip and filename_2.zip each containing the respective splitted files
The workflow should break the output automatically without user specifying the number of splits required . The only condition to ensure is that any file can't be 1 GB or more
Please advice if any one has ever faced with this scenario or have idea to tackle it .
So, as for the file size issue, the only thing i can think of initially is to make use of the max records per file option in the output tools configuration window (image below). Whilst this is based upon number of records rather than file size, you could manually identify how many records this should be. Not exactly what you are looking for but may be a start.
Hi Brian , Thanks for the response . This option would be the last option as trial and learn approach . But as you know the number of records isn't effective from automation point of view because it doesn't always define the size of file . The output file in my case which is a tab delimeted can accomodate n new columns in future bases on what is send from input and due to which for instance 5000000 rows previously with 10columns defining ~ 1GB will become 10+n columns now but this time 5million rows would be definetely more than ~1GB. I hope you get my point . Btw iam also simultaneously exploring if this can be handled outside of alteryx using other tools in extreme case but wud prefer something inside alteryx wihout manual intervention each time.
Attached is a macro that builds off of @brianprestidge comment. The attached Macro "Split and Zip" has three supporting macros, and uses the command line tool from 7-Zip.
I built this to generate .csv files, but you could change the config of one tool (noted below) and generate any file type you need.
You can specify how much data you would like to sample to generate the average record size estimate from (the higher percent, the greater the accuracy, but it will take longer). There is also a buffer option because sometimes the sample may under represent (randomly picking the smallest records), and the buffer helps ensure the files will be smaller then the set size, but there is no 100% guarantee.
As for the File Name, for an example, when you set the filename to "out.csv" and three files get generated, they will be named: out.csv out_1.csv out_2.csv
I also made zipping the output files optional since zipping can take a long time.
You will need to download the command line tool for 7-Zip and place it in the same directory as these macros.
To download, go to: http://www.7-zip.org/download.html - Find the non-beta "7-Zip Extra: standalone console version, 7z DLL, Plugin for Far Manager" - Download and extract the .7z file, extract and pull out the file "7za.exe" from the x64 folder (if you are on a 64 bit OS) - Place "7za.exe" in the same directory as these macros
Here is a quick breakdown of what each macro is doing:
Split and Zip
- Sample the records, eg 1% is 1 in 100 - Save File to a file in the temp directory (uses macro with "Records per File" parameter set to zero so there is no split) - Get File Size macro to get the file size - Interface elements and a Formula tool to calculate the desired "Records per File" - Save File with Split using calculated "Records per File" - Detour option for zipping files - Formula to swap out the extension for zip - Zip File macro to zip the files
Save File with Split
- Output Data tool, configured to split or not based on control parameters (if you need a different file type, you would change the config of this tool) - Summarize to get record count - Generate Row for each needed File Number (counts from zero to "Record Count"/"Records per File" rounded up) - Append with starting FullPath - Formula to update the FullPath and Record count for each file
Get File Size
- Directory configured with control parameters, so to pulls info on a specific file - Select to get just the Size field - Join by record position to combine with Input data
- Run Command using cmd.exe to execute the 7-Zip command (7za a "out.zip" "out.csv" > "zipmessage.csv") so we can pipe ">" the results to a text file to read back in (note the "/c" command prefix) - Filter to pull out the line with the zip file size - Formula to keep just the file size number - Select to keep just the Zip File Size field - Join by record position to combine with Input data
I have also attached an example "use split and zip.yxmd" to test this out with some generated data.
Please test this out, and let me know if this works or breaks in any way, I am happy to adjust it. If you need additional help, you are also welcome to send me a private message, and we can setup a screen share session.