I am importing 40+ CSV files into a workflow, with number of rows each anywhere form 50 to 60k+.
I would like to merge them, and then output them again as separate files but based on these rules (in this rule priority):
1. New output file and file name, every time the input file name is different. (this is already the default behavior of the output tool if you tick "take file/table name from field" and specify the fullpath field)
AND
2. New output file and file name, every time the input file reaches 8000 rows in size.
Example:
Bananas.csv (340 rows of banana items)
Peaches.csv (1200 rows of peaches)
Apples.csv (23534 rows of apples)
Watermelons.csv (600 rows of watermelons)
Should be output to:
Bananas (Batch 1).csv
Peaches (Batch 2).csv
Apples-1 (Batch 3).csv - First 8k rows of original file
Apples-2 (Batch 4).csv - Next 8k rows
Apples-3 (Batch 5).csv - Remaining 7534 rows of original file
Watermelons (Batch 6).csv
Any ideas? I imagine this will involve a lot of regex, either regex matching or regex count of certain filepaths, then using a multirow to write parts of the naming in different columns, then appending all fields together as a string in another column, but I can't really come up with the logic for this yet.