Create Ranges for Output Splits Based on Data Values and Record Count

a large dataset (200m+ records) is reduced to the following 2 fields (read from a table, tiled and group by/count), file attached

the file needs to be split sequentially into X pieces with the number of records written to each file at or below a specified threshold.

the threshold is determined based on the total record count and the number of records allowed in each file

the split needs to be determined by a range of codes so the ranges can be used in other processing. codes cannot be split across ranges

for example using the sample

file looks like this
code count
1001 308
1003 2424
1005 1849
1007 777
1009 869
1013 340
1015 279
1017 141
1019 142

3 files should be created if the threshold is set to 3000
the resulting files would be split as follows

code count
1001 308
1003 2424
total count 2732 -- file1

1005 1849
1007 777
total count 2626 --file2

1009 869
1013 340
1015 279
1017 141
1019 142
1021 254
total count 2025 file3
etc

the ranges need to be used in another process (outside alteryx)

there is no specific output format required but needs to identify the start and end points of range. i.e.

1001	file1
1003	file1
1005	file2
1007	file2
1009	file3
1013	file3
1015	file3
1017	file3
1019	file3
1021	file3

1001 - 1003 1

1005 - 1007 2

1009 - 1021 3

Thank you!

sample data.txt

Workflow

Accepted answers

AncientPandaman

or you can try with 2 multiple row formula tool

formula 1: if [Count]+[Row-1:Running Total]>3000 then [Count] else [Count]+[Row-1:Running Total] endif
formula 2: if [Count]=[Running Total] then [Row-1:File]+1 else [Row-1:File] endif

All comments

DanM

@nwhite for a quicker response from the Community, I would suggest telling everyone what you've tried to do in Alteryx so far to figure this out.

I would suggest looking into an iterative macro to solve this use case.

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Getting-Started-with-Iterative-Macros/ta-p/488173

https://community.alteryx.com/t5/Interactive-Lessons/tkb-p/interactive-lessons/label-name/Macros

nwhite

i've explored several different tools - multi row, tile, binning, running total, but as i have limited use with all i am stuck.

i will read up on iterative macros, another area i don't have experience with. thanks for the suggestion on a place to start.

mceleavey

Hi @nwhite ,

I think what you're trying to achieve here is an iteration where you create a running total of Count until 3000 is hit, then you put those records in a file and create the next running total for the next file.

To do this, I've built a simple iterative macro:

This creates a running total until the total goes over 3000. It then sets the running total to zero for those with a higher number and iterates them to the next run. The ones that fall within 3000 are given the first file number and passed out of the macro.

This repeats until they are all in a file.

The workflow takes the macro output and simply creates a dynamic filename, which you then use in the output tool as the output full path:

The results of the iterative macro are:

Hope this helps,

mceleavey_iteration_solution.yxzp

nwhite

Thank you for this - but i am unable to import as my company has not yet updated to the most recent version of Alteryx and i can't import it. Is there a way around this (short of installing the update?)

AncientPandaman

nwhite

THANK YOU SO MUCH! this is perfect!

Quick Links

This months top contributors

atcodedog05 19598

Qiu 15879

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703