Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
katecrawford
5 - Atom

There are four Macro types in Alteryx. This post introduces the Iterative Macro, which loops through a flow based on a condition.

 

What is Pagination?

 

Pagination is when you page through a website or API service to retrieve data. This process may be required if you find that there are limits to the number of rows that you can receive from a single page or multiple pages on a website.

 

This was the case when I needed 311 service requests data from NYC Open Data. Each page only has 1000 service requests, and there are many more than 1000 requests within a single month. In order to retrieve all the service requests for an entire year, I needed to make multiple API calls within a single Alteryx workflow.

 

What is an Iterative Macro?

 

A standard Alteryx workflow can only run once. However, Alteryx has Macros that can adjust how the data flows. Out of the four Macros available in Alteryx, the Iterative Macro is the one that allows you to loop through a sequence of tools until a certain condition is met.

 

Iterative Macros are useful for methods, like Pagination, which require doing one process over and over again. Macros are also used in standard tools, like the Data Cleansing Tool.

 

data-cleansing-tool.png

 

1) Build a Standard Workflow.

 

Before building any Macro, you should have a completed workflow. Do all testing here, as no outputs will show when the workflow is converted to a macro.

 

standardworkflow.png

 

This is the flow that downloads and parses a JSON request from NYC Open Data’s 311 Service Requests API. The URL for the API call contains a SoQL query that selects a set of columns and filters to a single year. The output includes BBL codes to join with the Building’s Energy Ratings.

 

2) Copy/Paste completed flow to new workflow.

 

Drag and select the part of the completed flow that downloads and parses the data. Paste this into a new workflow.

 

3) Configure the workflow.

 

Click anywhere on the canvas (not on a tool) to view the Workflow — Configuration window. Then, click the Workflow tab.

 

change-flow-type.png

 

Change the workflow type to Macro and select Iterative Macro from the dropdown.

 

4) Convert the Input Tool to a Macro Input Tool.

 

Right-click to convert at least one Input Tool to a Macro Input.

 

convert-input-tool (1).gif

 

5) Figure out how to page through data.

 

Like most Paginated APIs, NYC Open Data has pagination instructions on the SODA Developers website. Make a note of any limits, offsets, call formats, and the max number of calls.

 

 

<https://soda.demo.socrata.com/resource/earthquakes.json?$limit=5&$offset=0&$order=earthquake_id>

 

 

SODA has a limit of 1000 rows per call, so we will need to incrementally change the offset. There are unlimited calls, but we should increase the Timeout (seconds) between each download out of respect for shared space.

 

6) Split the flow with a stop condition.

 

You may have noted the Constants in the Workflow — Configuration window. These names and values are automatically generated by Alteryx’s Engine. You can access them from within the flow using Engine.Name. When paginating, Engine.IterationNumber is one way to increment offset values for each iteration.

 

After processing the downloaded data, add a Macro Output and name it “Output.”

 

Then, add a Filter Tool connected to the JSON Parse tool. Use a Custom Filter that checks if the Engine.IterationNumber is less than the max number of pages you want to call. You may know this number. If you do not, like I didn’t, then I recommend setting this value to a low number, like 10, and increasing as you test the output after step 11.

 

split-flow.png

 

7) Set the Iteration Input and Iteration Output.

 

View the Interface Designer window.

 

open-interface-designer-1.gif

 

In the Properties tab, set the Iteration Input and Iteration Output by tool name. Keep the configuration for when the Maximum Number of Iterations is Hit and Output Modes on the defaults.

 

8) Set limit and offset values.

 

In the same Formula Tool that creates the URL, add a field called limit. Then, multiply the Iteration Number by the page limit in a field called Current Page. Finally, add the limit and offset values to the URL.

 

configure-formula-tool.png

 

(optional) Add a more complex stop condition.

 

This is where you can get creative. The current method is simply a Filter Tool to decide when to start and stop the flow.

 

Extend the flow by checking for specific HTTPS Response Codes or row values. You could even change the data before the next iteration, like the starting offset.

 

9) Save the Macro.

 

File > Save As usual. Since we configured the workflow, Alteryx has automatically converted the file type to .yxmc. It is best to save macros in a folder together.

 

10) Add your Macro to a new workflow.

 

Right-click on the canvas to insert the Macro you just created and named. You may have to browse for it.

 

open-macro.png

 

11) Run the Macro.

 

Add the Input from the completed workflow and Ctrl+Shift+B to insert browse tools. Run the workflow.

 

test-macro.png

 

The red exclamation point may seem alarming, but it should tell you if the max amount of iterations were reached.

 

(optional) Add Interface Tools.

 

Set values used within the flow by adding Interface Tools. I’ve done this add a SoQL query, set a maximum number of pages to call, and the starting offset value.

 

add-interface-tools.png

 

Use the Layout and Test Views to refine the design of the interface.

 

adjust-layout-view.png

 

Thanks for reading!

 

This post was inspired by a post from Down Under by Joe Chan.

 

 

This blog was originally published on thedataschool.co.uk