Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Is there any way to split data for a batch macro?

gorkemakinci
6 - Meteoroid

Hi there, I have a batch macro that I have some calculations that I do inside of it. It has lots of formulas and it is working pretty rough when I try it with a big data (more than 1 mil row). It sometimes works sometimes gives an internal deadlock error. I tried to optimize my macro to it's finest and it still is the same.

 

I'd like to use the same macro of mine but if I can I'd like to have first 50 or 100k rows to work within my macro and afterwards the next 50/100k rows and so on until the whole big data set works. How can I achieve it?

6 REPLIES 6
JamesCharnley
13 - Pulsar

Hi @gorkemakinci 

 

Splitting data up is, at the highest level, the point of a batch macro, so this is definitely possible. What you need to do is just create some kind of grouping field for every X rows of your data, and send that into your control parameter to group it. Let's say you just add a RecordID , and then a multi-row formula that says if the Record ID is a multiple of 50000, then +1 to this grouping otherwise leaving it as it is. And that should batch the desired number of records in every time the macro runs.

gorkemakinci
6 - Meteoroid

I totally got the logic and I added a record ID tool, however can you guide me with what would the formula look like for the grouping field? I'm assuming we want it to start from 1 and then after 50001st row it switches to 2 and so on, And that grouping field is going to be my control parameter for my calculator macro.

 

If I got that right I only need to solve how to formulate it in the multi row formula tool! Thanks for clarifying my inquiry about Alteryx Designer, wasn't completely sure if there was a specific way or tool to do this, and if you'd like to help more with the formula I'd be glad (whenever so I can compare what I've written for a better understanding on how other people work and code in Alteryx!) Have a nice one!

JamesCharnley
13 - Pulsar

@gorkemakinci 

 

No problem! I would use a Multi-row formula like the following (mine is RowCount because they're generated but yours might be Record ID): 

 

if mod([RowCount],50000) = 0 then [Row-1:New Field] + 1 else [Row-1:New Field] endif

 

image.png

 

This will check for the remainder when my value is divided by 50000, and if it's 0, then add 1 to this new grouping value. You can see that's not exactly 50000 in this case but that shouldn't be relevant, or you could alter it slightly if you wanted.

 

You could achieve a similar result with the tile tool also. In this case, I have a million rows, and can easily separate them into twenty groups of equal records of 50000:

 

image.png

gorkemakinci
6 - Meteoroid

Since my dataset is getting bigger and bigger by time goes on I think the first solution would work better for me. However I'd like to ask some more tricky (for me) questions so I prepare myself and my dataset better for these procedures.

 

Would it make any difference if I have 3 similar macros and I run all of them together with the grouping field? What would the queue be like inside the Designer, would it run the grouping=0 rows for all 3 macros first, or would it run the first macro from the first grouping to the last one and then move onto the 2nd macro, or would it run all 3 macros together simultaneously?

 

Would it be alright if I just connect an output data tool right after the macro or do I need to tweak something or put a union tool to get all the different grouping data altogether?

JamesCharnley
13 - Pulsar

@gorkemakinci 

 

If the size of the dataset is changing, then yeah I'd definitely go with the first solution.

 

For your next question, depending on which version of Alteryx you're running, it could be easy or more difficult to achieve either of these outcomes. I'm not 100% sure on anything particularly computational but let's say you had your inputs connected to three macros on the canvas, I think each macro will start running when data reaches them at run time, and then each time one of your groups has finished the entire process within, the next group would be batched in. If one process is more complex than the others, I'd expect that one to take longer etc. Trying to run these at the same time could definitely be slow depending on the size of your data set and the complexity of the process, but I truthfully don't know if it would be slower or faster than running them sequentially.

 

If you wanted the first macro to run all the groups first and then move on to the second macro, the easiest way to do that would be with control containers if you have version 2023.1+, allowing you to easily orchestrate within a workflow.

 

For your final question, each time the batch macro runs, I think of the macro output as essentially outputting and automatically unioning those rows with their calculations applied automatically back to my main workflow. I believe you could directly connect an output tool after the macro no problem.

gorkemakinci
6 - Meteoroid

Unfortunately the version of my Alteryx is 2022.3, and can't update it for some different reasons. If I don't encounter any problems with all 3 of them working simultaneously I'll just let it go like this, if not I'll try and make them run sequentially.

 

I'm pretty happy that batch macro is able to union all different batches altogether, didn't want to deal with that too!

 

And overall I'm thankful to you to for replying all my inquiries! It was truly helpful and I learnt a lot! Hope this topic will help others in need, have a nice one!

Labels