Hi everyone,
I’m seeking clarification regarding the behavior of the Input Data tool when using the AMP Engine. Specifically, can anyone confirm whether the rows of data are input in the same order as they appear in the original Excel file? I understand that the AMP Engine can produce non-deterministic outputs for certain tools, but I’ve encountered conflicting information about whether the Input Data tool is affected by this.
In my current workflow, I am generating a Random Sample using the Random% Sample Tool with a Deterministic Output. I plan to place a RecordID tool immediately after the Input Data tool. My goal is to sort the data by Record ID before generating the Random Sample to ensure that the order of the data before the Sample Tool is consistent each time. Previously, the workflow didn’t sort the data, which led to varying sample outputs depending on which team member processed the workflow. By adding a Sort tool based on Record ID before sampling, I hope to achieve consistent results. However, if the order of the input data is not guaranteed, the Record ID may not serve its intended purpose.
Thank you for your insights!
Hi Bailey,
Unfortunately i do not know about AMP and the input tool. I only know there are issues with e.g. CSV files when using AMP.
To address your issue:
Would you not just sort the data before adding the recordID? Then you would overcome the issue of wether the data is in the same order from the input tool :)
@BaileyCallander
I am also not the expert on AMP engine but tried to check with the help.
With the AMP Engine architecture below, there is a chance that the Input data is disordered when read in. but I can not be sure.
Hi @Mathias_Nielsen ,
Thank you for your response. From a workflow perspective, that approach would definitely work. However, we've received feedback from our quality group indicating that sorting on a specific field could introduce sample bias.
My other idea is to split it into two workflows: first, I would load the data, add a RecordID, and run that workflow without using the AMP engine. Then, I would use the output from that workflow as input for the main workflow, which would utilize AMP. While this isn't ideal having 2 workflows, it will likely be faster than not using the AMP engine at all and reduces the risk of generating a different sample until I can confirm with Alteryx if it truly does differ.
I appreciate the response. I agree it seems like there is a definite chance that it could be disordered but I haven't been able to confirm myself also.