Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Does batch macro read source file at each pass?

trevorwightman
8 - Asteroid

Hello,

 

I have two very large files (millions of records). One file has Name and Postal, and one file had Name, Postal and a few additional Attributes. The goal is to do a Join/Fuzzy Match between them (multiple passes), however, when I bump the two full files together the run time is entirely too long. In searching for ways to reduce this I am trying to do a Batch Macro. So, I am trying to segment my data by State such that I can compare each state to one another instead of the full files to one another.

 

Question: When I run a Batch Macro, do the entire source files need to be read to know which records to grab? In the example below (disregard text) is the best way to set this up to do a filter and update State? And if I do it that way will each source file need to be read in its entirety for each pass?

 

trevorwightman_0-1581104322252.png

2 REPLIES 2
CharlieS
17 - Castor
17 - Castor

Hi @trevorwightman 

 

I often use batch macros to manage processing resources. The batch macro configuration in the image below may be loading the data each time. This would be controlled by the input "GroupBy" settings on the outside of the batch macro. The input "GroupBy" function of batch macros doesn't require a Control Parameter tool. 

 

Build the macro like you're expecting only a subset of records to be present (you could go ahead and filter both of your inputs to a particular batch value and use those as example inputs to build your macro). This means you don't need a Control Parameter tool/filter tool because the inputs will already be filtered at this point. Save as a batch macro and that's it.

 

Now that you're ready to use the batch macro, The files will be read once in their entirety (so the necessary batches can be determined from the contents of the files), then each batch is processed one at a time inside the macro.

 

20200207-BatchJoin.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Check out the attached example and you'll note in the results window that "3 iterations (read:batches) were run". 

 

EDIT: I couldn't help but notice that you're matching names at the postal level and are considering batching on state. If you're in the US, postal codes/ZIPs can cross state lines. Because of that, and if you want to rely on only same postal matching, then I suggest batching on postal codes. 

trevorwightman
8 - Asteroid

Thanks so much for the response. I gave that a try but am still getting a few errors and do not know why. I hope it's okay that I am only posting a PDF of what my workflow looks like.

 

Start of workflow:

I am starting with the file Angie9269 as well as Email9269. The Angie file has Name and Postal and the Email file has Name, Postal and Email. They contain records with zipcode 92690 through 92699. My goal is to loop through the zipcodes in each list to do 10 loops. I first do a little bit of cleanup of these files to standardize them and then connect them to my macro. I also connected my zipcode list to the macro as my control parameter.

 

Batch Workflow:

I take my source files and connect them to the join tool and I connect the control parameter to the join intending to iterate on zipcode.(I choose the Update Value to be the zipcode field from my Name/Postal Source, but this seems like it's not the right thing to do...)

trevorwightman_0-1581115849037.png

 

Here is the error I get when I run the main workflow.

trevorwightman_1-1581115924619.png

 

EDIT: My main goal is to do this on a nationwide scale so hopefully iterations through thousands of zipcodes won't be a problem for the batch and is the most efficient way to run it. Additionally, you will notice that in my batch macro I am taking a few passes at matching. Pass 1 is an exact match and then I slowly loosen the criteria for fuzzy matching to allow in other matches to flow in. Pass 5 is matching on Address only. At the end I combine all of the matches with the merge tool.

 

EDIT2: Hot dog, I might have gotten it! I removed the control parameter altogether, and switched the connections on my two source files. I put the Angie file on the bottom connection of the macro and the Email file in the middle connection of the macro and it seems to have magically worked! However, I don't know exactly why it worked. Does it make sense to you why I would have gotten those errors without switching the connections?

Labels
Top Solution Authors