This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 04-29-201602:51 PM - edited on 05-21-201902:34 PM by SydneyF
Sometimes large amounts of data can 'overwhelm' a tool or process in your workflow and make it appear like it is stuck or frozen. This was the case recently when a user attempted to pass 7 million rows of data to a CASS tool. Splitting these records into smaller chunks makes the process run much quicker.
In this example, we'll discuss how you can control the number of records that pass to a batch macro. Think of a batch macro as a process that runs multiple times until all records have been processed. The 'process' is simply a workflow you want to run with the addition of a control parameter.
In the attached example, we have an input file of 16 records in a single column called 'Address1'. In reality, this file could be much larger, containing millions of rows and many columns. We will group our data so each group contains 5 records (this should give us a total of 4 groups where the last group only has 1 record in it). From there, we will send 5 records at a time to a process that counts the number of times the process runs/iterates (if all goes well, the process will iterate for each group so we should see 4 iterations). The process itself is simple. It selects a specific field using a Select tool so we only get the information we need (Address1) and the Select tool connects to a Formula tool (If you're using the Formula tool in v11.0+, be sure to familiarize yourself with the tool's interface redesign!) that creates a field called 'IterationID'.
The first step is to bring in your input file. In our case, it is a text file with 16 records and a single column.
Connect a Record ID tool (If you're using the Record ID tool in v11.0+, be sure to familiarize yourself with the tool's interface redesign!) to the Text Input. We will give the first group a starting value of 0.
Connect a Formula tool to the Record ID tool.
Create a new field called 'GroupID' and make the data type numeric (Int32 in this example). For the expression, write FLOOR(TONUMBER([RecordID])/5). This changes the data type for 'RecordID' to numberic and the keyword FLOOR ensures the number will be an interger. Divide by 5 so each group will have no more than 5 records in a group. In your real-world example, you may need to play around with that number to determine which is quickest for your workflow. If you have millions of records in your file, a good place to start may be to replace 5 with 10000 or 5000.
Connect a Summarize tool to the Formula tool to get a list of unique GroupIDs.
Connect a Select tool to the Formula tool as shown here (both Summary and Select tools are connect to the Formula tool).
At this point, we no longer need RecordID, so unchecked that field.
We have created our 4 groups of no more than 5 records. Now let's discuss the process we want to use as a batch macros that will reach each of these 4 groups, one at a time, until all 4 have looped through the process. This process just selects a specific record using a Select tool.
We only want Address1. The GroupID will no longer be needed in after this point.
Connect a Formula tool to the Select tool. Create a new field called 'InterationID' and make the data type numeric. In the 'Constants' section below, notice there is something called 'Engine.InterationNumber'. This constant is the expression and what it does is create a running total for how many times our process iterates.
Enter 'GroupID' as the label for the Control Parameter.
For the Macro Input tool, you need to provide a template for that the data will look like as data gets passed to the Selection tool. Click on 'Edit Data' under 'Text Input'.
The data will have a field called 'Address1' and 'GroupID'. Enter your data so it looks like this:
Now save your batch macro, remembering where you saved it.
Go back to your workflow where you created the groups.
Connect your macro you created to the Summary and Select tools like this (you can add a browse after the macro):
The best way to add your macro to the work area is to right click --> Insert --> Macro and select the macro where you saved it.
Select your macro.
To configure your macro, there are two tabs in the Configuration window.
Configure the Group By tab like this:
And the Questions tab like this:
If you haven't already, you can add a Browse tool or an Output tool to your macro tool.
Looking at the output from the Browse tool, we see this:
Which shows 5 records were run through the first, second, and third iterations, and the fourth iteration processed a single record, which is what we expected to have happened. Use this approach and engineer it to your specific situation to improve the overall efficiency of your workflow whenever you have an input file that is very large. It may help some of your tools and processes run quicker in the long run.