This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
04-29-2016 02:51 PM - edited 08-03-2021 04:13 PM
Sometimes large amounts of data can 'overwhelm' a tool or process in your workflow and make it appear like it is stuck or frozen. This was the case recently when a user attempted to pass 7 million rows of data to a CASS tool. Splitting these records into smaller chunks makes the process run much quicker.
In this example, we'll discuss how you can control the number of records that pass to abatch macro. Think of a batch macro as a process that runs multiple times until all records have been processed. The 'process' is simply a workflow you want to run with the addition of a control parameter.
In the attached example, we have an input file of 16 records in a single column called 'Address1'. In reality, this file could be much larger, containing millions of rows and many columns. We will group our data so each group contains 5 records (this should give us a total of 4 groups where the last group only has 1 record in it). From there, we will send 5 records at a time to a process that counts the number of times the process runs/iterates (if all goes well, the process will iterate for each group so we should see 4 iterations). The process itself is simple. It selects a specific field using a Select tool so we only get the information we need (Address1) and the Select tool connects to a Formula tool(If you're using the Formula tool in v11.0+, be sure to familiarize yourself with thetool's interface redesign!) that creates a field called 'IterationID'.
The first step is to bring in your input file. In our case, it is a text file with 16 records and a single column.
Connect a Record ID tool (If you're using the Record ID tool in v11.0+, be sure to familiarize yourself with the tool's interface redesign!) to the Text Input. We will give the first group a starting value of 0.
Connect a Formula tool to the Record ID tool.
Create a new field called 'GroupID' and make the data type numeric (Int32 in this example). For the expression, write FLOOR(TONUMBER([RecordID])/5). This changes the data type for 'RecordID' to numberic and the keyword FLOOR ensures the number will be an interger. Divide by 5 so each group will have no more than 5 records in a group. In your real-world example, you may need to play around with that number to determine which is quickest for your workflow. If you have millions of records in your file, a good place to start may be to replace 5 with 10000 or 5000.
Connect a Summarize tool to the Formula tool to get a list of unique GroupIDs.
Connect a Select tool to the Formula tool as shown here (both Summary and Select tools are connect to the Formula tool).
At this point, we no longer need RecordID, so unchecked that field.
We have created our 4 groups of no more than 5 records. Now let's discuss the process we want to use as a batch macros that will reach each of these 4 groups, one at a time, until all 4 have looped through the process. This process just selects a specific record using a Select tool.
We only want Address1. The GroupID will no longer be needed in after this point.
Connect a Formula tool to the Select tool. Create a new field called 'InterationID' and make the data type numeric. In the 'Constants' section below, notice there is something called 'Engine.InterationNumber'. This constant is the expression and what it does is create a running total for how many times our process iterates.
To make the process above a batch macro, add a Control Parameter tool to work area. Connect a Macro Input tool before the Selection tool and a Macro Output tool after the Formula tool as shown below.
Enter 'GroupID' as the label for the Control Parameter.
For the Macro Input tool, you need to provide a template for that the data will look like as data gets passed to the Selection tool. Click on 'Edit Data' under 'Text Input'.
The data will have a field called 'Address1' and 'GroupID'. Enter your data so it looks like this:
Now save your batch macro, remembering where you saved it.
Go back to your workflow where you created the groups.
Connect your macro you created to the Summary and Select tools like this (you can add a browse after the macro):
The best way to add your macro to the work area is to right click --> Insert --> Macro and select the macro where you saved it.
Select your macro.
To configure your macro, there are two tabs in the Configuration window.
Configure the Group By tab like this:
And the Questions tab like this:
If you haven't already, you can add a Browse tool or an Output tool to your macro tool.
Looking at the output from the Browse tool, we see this:
Which shows 5 records were run through the first, second, and third iterations, and the fourth iteration processed a single record, which is what we expected to have happened. Use this approach and engineer it to your specific situation to improve the overall efficiency of your workflow whenever you have an input file that is very large. It may help some of your tools and processes run quicker in the long run.
Thanks, works perfectly.
is there an option to combine multiple records into 1 and then maybe submit a API request @WayneWooldridge
@WayneWooldridge i was processing 41 Million records through the US geocoder with the CASS option and the process ran for about 20 hours before i killed it as it was consuming all the CPU. i had done a small test of 200k records which took 2 minutes to process. By that estimate it should have taken a little under 5 hours for it to process the data but it ran forever.
do you know whats the optimum batch size for the cass tool would be if i use the batch macro and i can improve on the performance from 20 hours to something less.
Could the tile tool not be used at the beginning, rather than using a formula FLOOR formula?