Hello,
I have a large dataset (over 5 million observations) and I have already divided the dataset into chunks. Now I want to process one chunk at a time according to the chunk_id. I could use a filter tool, but that's not really an option as I would need 27 filter tools.
Is there any way to update the filter condition and process each chunk individually.
Solved! Go to Solution.
1. Can you attach the pseudo_data as well? Or export your workflow as yxzp?
2. What exactly are the processes you need to do, that you need to do them one at a time? Is there a different process for chunk 1 than chunk 2? If the process is the same for all chunks then why not just sort the data by chunk #?
Hello,
I have attached the file as yxzp. I need to upload the chunks in Python. The Python code is already ready and works fine. However, there is a problem: the bigger the dataset gets, the longer it takes to upload the data into the Python tool.
I just need to tell the filter tool in Alteryx for each chunk_id the id and so on
For example:
first iteration Check if chunk_id = 1 then
second iteration Check if chunk = 2 then and so on until unitl chunk 27.
Is there any way to update the filter tool?
I can think of two solutions at the moment:
1) do what you did with the Filters, but combine them with a Block Until Done tool.
2) write a dynamic Python code that will essentially do the same thing (but you will not be in a need of defining the number of chunks from the start) - not sure how to write it myself, though, so I would go with option 1.
Also, Python is not the fastest tool around. Maybe you could rewrite the code entirely in Alteryx instead and get better results?
PS. I couldn't open your yxzp, because I have an older version of Alteryx. :(
A better way to do this is to use a batch macro that dynamically updates the filter for each chunk (iterates per chunk). Linked here is an example
Also, I saw you have a ceiling formula in there that wasn't doing anything if you want you can group X amount of fields per group by doing
CEIL([chunk_id]/27) which will group the first 27 records into the first iteration.
If this helped to solve your issue, please make sure to mark it as a solution.
Thanks
THX Carli, it helped me to progress.
All the best to both of you.