Need help building an output name schema + file segmentation based on 2 rules

I am importing 40+ CSV files into a workflow, with number of rows each anywhere form 50 to 60k+.

I would like to merge them, and then output them again as separate files but based on these rules (in this rule priority):

1. New output file and file name, every time the input file name is different. (this is already the default behavior of the output tool if you tick "take file/table name from field" and specify the fullpath field)

AND

2. New output file and file name, every time the input file reaches 8000 rows in size.

Example:

Bananas.csv (340 rows of banana items)

Peaches.csv (1200 rows of peaches)

Apples.csv (23534 rows of apples)

Watermelons.csv (600 rows of watermelons)

Should be output to:

Bananas (Batch 1).csv

Peaches (Batch 2).csv

Apples-1 (Batch 3).csv - First 8k rows of original file

Apples-2 (Batch 4).csv - Next 8k rows

Apples-3 (Batch 5).csv - Remaining 7534 rows of original file

Watermelons (Batch 6).csv

Any ideas? I imagine this will involve a lot of regex, either regex matching or regex count of certain filepaths, then using a multirow to write parts of the naming in different columns, then appending all fields together as a string in another column, but I can't really come up with the logic for this yet.

Workflow

Accepted answers

BrandonB

Example workflow is attached. Ignore the text input and the generate rows (This was done for me to create sample data). At this point you would already have your data stacked. Connect a Record ID tool to your data and in the formula after the generate rows, use the Record ID instead of RowCount. The rest of the logic should follow. Finally, you would just have your Output Data tool at the end where you take the new file name that you have created. Let me know if you have any questions!

Example batch output.png

Batch output.yxmd

All comments

BrandonB

Once all of the files are stacked, you can use a formula tool after creating a new field with the value of 1, set as an int64. Let’s call this field counter. Then use a running total tool where you group by file name and create a running total on the counter field that we just created. Finally, we can create another formula that modifies the sheet name by saying if running total < 8000 then file name else if running total > 8000 and running total < 16000 then file name + “2”, etc. Then feed this into an output tool and it will create the sheets based on the logic that you are looking to use.

BrandonB

Alternatively I think your formula could be

filename + tostring(round([running total]/8000,1))

Edit: you may actually want to use the CEIL() function instead to get to the next whole number rather than round to the closest whole number

AkisM

Hi @BrandonB and thanks for your input, I'm currently trying to set it up as you described in your first reply.

Also, I need the batch number at the end of each file name, and I tried using a multi row formula for that but doesn't seem to work:

If [Row-1:FileName]!=[FileName] then [Row-1:Counter]+1 else [Counter] endif

Used the browse tool to see if I in fact have 40+ unique counter values, and there are only 3. Even though the same browse tool shows me there are 40+ unique file names. Any ideas why?

Also, [Counter] is stuck on 1. I thought after using running total, it would increment in numbers each time the file name changed.

BrandonB

Example batch output.png

Batch output.yxmd

Quick Links

This months top contributors

atcodedog05 19458

Qiu 15866

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703