Create Ranges for Output Splits Based on Data Values and Record Count
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
a large dataset (200m+ records) is reduced to the following 2 fields (read from a table, tiled and group by/count), file attached
the file needs to be split sequentially into X pieces with the number of records written to each file at or below a specified threshold.
the threshold is determined based on the total record count and the number of records allowed in each file
the split needs to be determined by a range of codes so the ranges can be used in other processing. codes cannot be split across ranges
for example using the sample
file looks like this
code count
1001 308
1003 2424
1005 1849
1007 777
1009 869
1013 340
1015 279
1017 141
1019 142
3 files should be created if the threshold is set to 3000
the resulting files would be split as follows
code count
1001 308
1003 2424
total count 2732 -- file1
1005 1849
1007 777
total count 2626 --file2
1009 869
1013 340
1015 279
1017 141
1019 142
1021 254
total count 2025 file3
etc
the ranges need to be used in another process (outside alteryx)
there is no specific output format required but needs to identify the start and end points of range. i.e.
1001 | file1 |
1003 | file1 |
1005 | file2 |
1007 | file2 |
1009 | file3 |
1013 | file3 |
1015 | file3 |
1017 | file3 |
1019 | file3 |
1021 | file3 |
or
1001 - 1003 1
1005 - 1007 2
1009 - 1021 3
Thank you!
Solved! Go to Solution.
- Labels:
- Workflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@nwhite for a quicker response from the Community, I would suggest telling everyone what you've tried to do in Alteryx so far to figure this out.
I would suggest looking into an iterative macro to solve this use case.
https://community.alteryx.com/t5/Interactive-Lessons/tkb-p/interactive-lessons/label-name/Macros
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
i've explored several different tools - multi row, tile, binning, running total, but as i have limited use with all i am stuck.
i will read up on iterative macros, another area i don't have experience with. thanks for the suggestion on a place to start.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @nwhite ,
I think what you're trying to achieve here is an iteration where you create a running total of Count until 3000 is hit, then you put those records in a file and create the next running total for the next file.
To do this, I've built a simple iterative macro:
This creates a running total until the total goes over 3000. It then sets the running total to zero for those with a higher number and iterates them to the next run. The ones that fall within 3000 are given the first file number and passed out of the macro.
This repeats until they are all in a file.
The workflow takes the macro output and simply creates a dynamic filename, which you then use in the output tool as the output full path:
The results of the iterative macro are:
Hope this helps,
M.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you for this - but i am unable to import as my company has not yet updated to the most recent version of Alteryx and i can't import it. Is there a way around this (short of installing the update?)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
or you can try with 2 multiple row formula tool
formula 1: if [Count]+[Row-1:Running Total]>3000 then [Count] else [Count]+[Row-1:Running Total] endif
formula 2: if [Count]=[Running Total] then [Row-1:File]+1 else [Row-1:File] endif
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
THANK YOU SO MUCH! this is perfect!