Hi. I built an iterative macro to get make API calls. Each call uses a customer ID as parameter, and the results may be paginated. The iterative macro takes the API url as input. The max iteration is set to 5000. As a test my input data has 10 unique customer IDs.
The expected number of records to be extracted by this data isn't much, but the iterative macro took so long to run that I had to terminate it. Out of the blue I wrapped the iterative macro inside a batch macro, taking the same API url as input. This one ran fine. Here's the construction of the batch macro (the red node is the iterative macro):
Out of curiosity I ran only 4 IDs. This time the iterative-only method ran ok, but I noticed there's a difference in the results. For the iterative-only method, it looks like this:
But the batch macro method yields this result:
Both macros give the same 6 records (because 1 ID needs to iterate 3 times) as expected. But the returned sequence is different. The batch macro method follows the sequence the input rows, but not the iterative-only method.
I suspect that I may have misunderstood some macro basics, as it seems for this case the iterative macro will only work correctly if embedded inside a batch macro. Could someone please provide some insights into this?
The main difference between a batch and an iterative macro has to do with the way that the Alteryx engine determines how many times to run the macro, assuming you haven't reached the iteration limit, in which case both types will stop iterating. With a batch macro, it's straightforward. It processes all the input records once for each row in the Control Parameter. 10 rows in the control parameter = 10 iterations of the batch macro, assuming that the Iteration limit isn't reached
With an iterative macro, things are more complex. It will iterate until there are no more rows passed from the Iterative output back to the iterative input. It's up to you to code things so that this eventually happens after the correct number iterations. For your API example, you need some kind of filter at the end that checks to see if there are more records to fetch. If yes, you pass the current results to the normal output and the next fetch URL to the Iterative output. If no, pass the current results to normal output and zero(0) records to the Iterative output. The engine will see that no records are available at the iterative input and will stop, returning the unioned results of all the previous iterations. Without the filter at the end, the iterative macro will just loop until the iteration limit is reached. This is probably why your iterative macro took a while, iterating 5000 times.
If you're a programmer, you can think a batch macro as a FOR loop where you set the number of iterations before starting the loop. An iterative macro behaves like a WHILE loop, where it's up to you to ensure that the loop doesn't become an infinite one.
Thanks @danilang for your insights. The for/while analogy is certainly helpful. But I am still a bit uncertain as to why connecting the iterative macro directly vs embedding inside a batch macro will cause such a difference. My iterative macro has an exit condition and I am quite confident that it's working as expected.
I ran some tests again and discovered that for the direct-iterative method, some of the API calls that involve pagination will go wacky. By default the API call will only return 25 records each page. For example, if there are 32 records in total, I found that it will be called multiple times instead of just 2 times (expected page 1 is #1-25, then page 2 is #25-32. But instead both pages were called multiple times).
The only difference I can see is that if I embed the iterative macro inside a batch macro, the batch control tool feeds the call to the iterative macro one by one, while if I connect the incoming records directly to the iterative macro, it somehow.... screwed up the iterative process in the iterative macro?
If you're feeding multiple rows into the iterative macro directly, make sure the the logic that feeds the rows back into the iterative input is rock solid. If you're getting multiple calls to a single page, it probably means that the url for that page it being called in multiple iterations.
You can turn on the setting to "Show All Macro Messages" on the Runtime tab in the workflow configuration. This will show all messages generated in the macro as well the iteration number that they occurred. This can help you track down where the pages are called more than once
You are correct - the culprit was a join inside the iterative macro that messed up the iteration logic. That would cause problem when I feed multiple rows directly into the iterative macro, but was ok if I wrapped it inside a batch macro because the batch macro fed the records into the iterative macro one by one.
So I think embedding the iterative macro inside a batch macro seems to be more intuitive, plus it orders the output records correctly which could be useful at times. I'd probably stick to this approach although it is a bit slower.
Another thing I just discovered was that if I didn't connect a downstream tool to the iterative macro output node, the node will not show all output records - something I guess must be by design.