There are four Macro types in Alteryx. This post introduces the Iterative Macro, which loops through a flow based on a condition.
Pagination is when you page through a website or API service to retrieve data. This process may be required if you find that there are limits to the number of rows that you can receive from a single page or multiple pages on a website.
This was the case when I needed 311 service requests data from NYC Open Data. Each page only has 1000 service requests, and there are many more than 1000 requests within a single month. In order to retrieve all the service requests for an entire year, I needed to make multiple API calls within a single Alteryx workflow.
A standard Alteryx workflow can only run once. However, Alteryx has Macros that can adjust how the data flows. Out of the four Macros available in Alteryx, the Iterative Macro is the one that allows you to loop through a sequence of tools until a certain condition is met.
Iterative Macros are useful for methods, like Pagination, which require doing one process over and over again. Macros are also used in standard tools, like the Data Cleansing Tool.
Before building any Macro, you should have a completed workflow. Do all testing here, as no outputs will show when the workflow is converted to a macro.
This is the flow that downloads and parses a JSON request from NYC Open Data’s 311 Service Requests API. The URL for the API call contains a SoQL query that selects a set of columns and filters to a single year. The output includes BBL codes to join with the Building’s Energy Ratings.
Drag and select the part of the completed flow that downloads and parses the data. Paste this into a new workflow.
Click anywhere on the canvas (not on a tool) to view the Workflow — Configuration window. Then, click the Workflow tab.
Change the workflow type to Macro and select Iterative Macro from the dropdown.
Right-click to convert at least one Input Tool to a Macro Input.
Like most Paginated APIs, NYC Open Data has pagination instructions on the SODA Developers website. Make a note of any limits, offsets, call formats, and the max number of calls.
<https://soda.demo.socrata.com/resource/earthquakes.json?$limit=5&$offset=0&$order=earthquake_id>
SODA has a limit of 1000 rows per call, so we will need to incrementally change the offset. There are unlimited calls, but we should increase the Timeout (seconds) between each download out of respect for shared space.
You may have noted the Constants in the Workflow — Configuration window. These names and values are automatically generated by Alteryx’s Engine. You can access them from within the flow using Engine.Name
. When paginating, Engine.IterationNumber
is one way to increment offset values for each iteration.
After processing the downloaded data, add a Macro Output and name it “Output.”
Then, add a Filter Tool connected to the JSON Parse tool. Use a Custom Filter that checks if the Engine.IterationNumber
is less than the max number of pages you want to call. You may know this number. If you do not, like I didn’t, then I recommend setting this value to a low number, like 10, and increasing as you test the output after step 11.
View the Interface Designer window.
In the Properties tab, set the Iteration Input and Iteration Output by tool name. Keep the configuration for when the Maximum Number of Iterations is Hit and Output Modes on the defaults.
In the same Formula Tool that creates the URL, add a field called limit
. Then, multiply the Iteration Number by the page limit in a field called Current Page
. Finally, add the limit and offset values to the URL.
This is where you can get creative. The current method is simply a Filter Tool to decide when to start and stop the flow.
Extend the flow by checking for specific HTTPS Response Codes or row values. You could even change the data before the next iteration, like the starting offset.
File > Save As usual. Since we configured the workflow, Alteryx has automatically converted the file type to .yxmc
. It is best to save macros in a folder together.
Right-click on the canvas to insert the Macro you just created and named. You may have to browse for it.
Add the Input from the completed workflow and Ctrl+Shift+B
to insert browse tools. Run the workflow.
The red exclamation point may seem alarming, but it should tell you if the max amount of iterations were reached.
Set values used within the flow by adding Interface Tools. I’ve done this add a SoQL query, set a maximum number of pages to call, and the starting offset value.
Use the Layout and Test Views to refine the design of the interface.
Thanks for reading!
This post was inspired by a post from Down Under by Joe Chan.
This blog was originally published on thedataschool.co.uk
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.