Hello Alteryx Community,
I have approximately 40,000 line items that I need to split into files with 500 entries each. While doing this, I need to ensure that the last entry of each file is not duplicated in the subsequent file.
For example, if I want to split a file of 25 entries into 3 files with 8 entries each, setting the Max Records for the file in the Output tool would cause "My User - 2" to appear in both the 1st and 2nd files, which I would like to avoid. Given the large number of line items, manually checking is impractical. Could you please advise on how to split the files without having the entries of same user in multiple files?
Thank you in Advance!!
@SekharSai1993
If I understand your intention correction, we can use the "Take File/Table Name from Field" option of Output tool.
Thanks for commenting Qiu. If you check the output, User-2 is splitted in both First and Second File and in the same way User-3 got splitted in Second & Third File. My objective is not to achieve exactly 8 records per file. Instead, I aim to ensure that the larger file of 40,000 records is divided into smaller files of approximately 500 entries each, ensuring that no user is split between two files. A slight variation around 500 entries is acceptable.
For this example, I am okay if the file got divided into 2 where 1 File will have 12 entries and another file have 12 entries but my Aim of having same user in one get achieved.
Please check and let me know if this is possible.
Can you post sample input data in a table format, or attached Excel file? (not an image)
Thanks @ChrisTX for looking into it. Please find attached for the Sample Data
Your data is wrong -> @Qiu is correct. His formula is correct. It is not duplicating any data. Your data has the duplications.
@apathetichell - I am unsure if I provided clear information earlier, so I would like to reiterate my requirement. We need to load 1000 users into a system, each with a specific set of roles. Different users may have multiple roles assigned. For example, User - 1 has 4 roles, User - 2 has 8 roles, and User - 3 has 5 roles. While splitting the files, it is crucial that No User is divided between two files. However, in the current solution, User - 2's roles are split between the first and second files. When splitting the larger dataset of 40,000 records, I want to ensure that no user is divided between the smaller files. I hope this clarifies my requirement.
Try using tile tool, equal records option? You will have to specify how many tiles you need though. So for example if you have 40,000 records, you can say you need 80 tiles, which will split it into 80 tiles of 500 each.
To avoid splitting of users, you can try using group by option.
To make it dynamic, put this part of flow inside a macro and update the number of tiles based on some calculation.
@SekharSai1993 got it -> easiest way I can see doing this is with an iterative macro. split off each group based upon your divisor (use a variable for 50K) -> split off your group using max/min or running total -> remove the records you've allocated -> iterate over the others. use @Qiu 's sheet assign. Iterative macro is the way to go here.