a large dataset (200m+ records) is reduced to the following 2 fields (read from a table, tiled and group by/count), file attached
the file needs to be split sequentially into X pieces with the number of records written to each file at or below a specified threshold.
the threshold is determined based on the total record count and the number of records allowed in each file
the split needs to be determined by a range of codes so the ranges can be used in other processing. codes cannot be split across ranges
for example using the sample
file looks like this
code count
1001 308
1003 2424
1005 1849
1007 777
1009 869
1013 340
1015 279
1017 141
1019 142
3 files should be created if the threshold is set to 3000
the resulting files would be split as follows
code count
1001 308
1003 2424
total count 2732 -- file1
1005 1849
1007 777
total count 2626 --file2
1009 869
1013 340
1015 279
1017 141
1019 142
1021 254
total count 2025 file3
etc
the ranges need to be used in another process (outside alteryx)
there is no specific output format required but needs to identify the start and end points of range. i.e.
1001 | file1 |
1003 | file1 |
1005 | file2 |
1007 | file2 |
1009 | file3 |
1013 | file3 |
1015 | file3 |
1017 | file3 |
1019 | file3 |
1021 | file3 |
or
1001 - 1003 1
1005 - 1007 2
1009 - 1021 3
Thank you!
Solved! Go to Solution.
@nwhite for a quicker response from the Community, I would suggest telling everyone what you've tried to do in Alteryx so far to figure this out.
I would suggest looking into an iterative macro to solve this use case.
https://community.alteryx.com/t5/Interactive-Lessons/tkb-p/interactive-lessons/label-name/Macros
i've explored several different tools - multi row, tile, binning, running total, but as i have limited use with all i am stuck.
i will read up on iterative macros, another area i don't have experience with. thanks for the suggestion on a place to start.
Hi @nwhite ,
I think what you're trying to achieve here is an iteration where you create a running total of Count until 3000 is hit, then you put those records in a file and create the next running total for the next file.
To do this, I've built a simple iterative macro:
This creates a running total until the total goes over 3000. It then sets the running total to zero for those with a higher number and iterates them to the next run. The ones that fall within 3000 are given the first file number and passed out of the macro.
This repeats until they are all in a file.
The workflow takes the macro output and simply creates a dynamic filename, which you then use in the output tool as the output full path:
The results of the iterative macro are:
Hope this helps,
M.
Thank you for this - but i am unable to import as my company has not yet updated to the most recent version of Alteryx and i can't import it. Is there a way around this (short of installing the update?)
or you can try with 2 multiple row formula tool
formula 1: if [Count]+[Row-1:Running Total]>3000 then [Count] else [Count]+[Row-1:Running Total] endif
formula 2: if [Count]=[Running Total] then [Row-1:File]+1 else [Row-1:File] endif
THANK YOU SO MUCH! this is perfect!