Hi there,
I am seeking to deepen my understanding of the AMP engine's processes to ensure consistency in workflows executed across different computers within my team. I understand that the AMP engine utilizes parallel processing to split data into chunks, allowing multiple functions to be processed simultaneously.
What I am particularly interested in is how the records are determined to be split into these chunks. When I run the workflow on my computer, it consistently splits the data into the same chunks, resulting in the same order of output each time. However, I am curious whether running the same workflow on a different computer—especially one with varying processing capabilities or a different version of Alteryx—would lead to different groupings of data into chunks.
I am trying to establish best practices for my team regarding tools that rely on a specific ordering of the data. Any insights into this process would be greatly appreciated, as we have noticed that occasionally a different team member will receive a different order of output when running the same workflow.
A little late on this... and I'm not a developer that knows exactly the process, but have a slight idea.
AMP will chunk the data depending on memory settings and unique tool settings.
How it groups the data for parallel processing is not something that is documented and backwards compatible tested etc. So, even if you worked out each tools individual process, there is no guarantee it will stay consistent across time/versions/diff machines. If the data needs to be sorted, then put in a sort. It is advisable to assume that AMP will be on in the workflow, as people may copy part of the workflow to a new workflow, and not think about that setting.
@BaileyCallander
There is another discussion here might be helpful.
Bottom line, use an combination of RecordID and Sort to assure the original order as suggested by @KGT .
User | Count |
---|---|
106 | |
85 | |
76 | |
54 | |
40 |