Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Running Batch Macro on Multiple files from a folder

StockMarket
8 - Asteroid

I have gone through many example batch macro threads here, but since I am new to Alteryx, therefor I am finding it a bit complicated to execute correctly in my case. Specially the part where we have to create the "Control Parameter Step" and update it with new file names, so that the process can be repeated on multiple files. I am having trouble trying to find the best way to create a batch macro which will work in my case, so any help or suggestions would be greatly appreciated.


I have to run this workflow on csv data files, which are quite large, having around 4 million rows per csv file and I have to run this flow onto hundreds of such files present within the same folder. The good news is that all these files have the exact same structure and format and a constant schema throughout. So this should reduce the complexity of the batch macro, otherwise some examples of batch macros were looking very complex and hard to understand to me.


I need to make sure that the complete workflow runs on the FIRST CSV FILE and generate the required output files, then the complete workflow runs on the SECOND CSV FILE and generate the required output files and this process continues, till the VERY LAST CSV FILE is processed in the given folder. All csv files are located in this same folder - "C:\Data\Batch Macro Sample Data".

 

I cannot first import all the csv files into a big table and then run the workflow. It has to be done, ONE FILE AT A TIME, separately and keep on repeating, one after the other, till all the files have been processed in that folder.

 

I have gone through a lot of different links, but have failed to create the desired output. I would really appreciate, if someone could take the sample data files that I have attached and create a workflow based on that, instead of just posting here the links to other threads.


To make it easy for you, I have attached three csv files having the sample data, on which this workflow needs to be run. Although the actual data files are quite big, having millions of rows in it, but I have made a very small sample out of it for sharing on the forum. All this data has a constant schema throughout.


I have also attached the Alteryx Worflow Package .yxzp file, so that you can easily open it up on your machine. I have used the latest version of Alteryx for creating this workflow. You may please have a look.


Thanks a lot

 

 

Workflow.png

11 REPLIES 11
AngelosPachis
16 - Nebula

Hi @StockMarket ,

 

The first thing you have to do is get a list of all the different file names that you want to feed in your batch macro. You can do that by using a directory tool and pointing it to the folder where all your input files are located. Then, keep only the filename column and parse it so you get a list of all the different filenames.

 

Screenshot 2020-12-05 112853.jpg

 

and your input before your batch macro should look like that:

 

AngelosPachis_0-1607167939577.png

 

Then in your batch macro, you want to use a control parameter to update the file name in your input data tool. You should configure the action tool as shown below

 

Screenshot 2020-12-05 112916.jpg

 

Then you should get an output that looks like that

 

AngelosPachis_1-1607168187590.png

 

Hope that helps, let me know if you have any further questions.

 

Regards,

 

Angelos

 

 

AkimasaKajitani
17 - Castor
17 - Castor

Hi @StockMarket 

 

I converted the provided workflow into a batch macro and created a workflow to use the batch macro.

 

The output path can be selected by you as an option of the batch macro.

 

Workflow:

AkimasaKajitani_0-1607168060474.png

Batch Macro:

AkimasaKajitani_1-1607168110570.png

 

StockMarket
8 - Asteroid

@AngelosPachisThank you so much for such a nice explanation with proper snapshots and for attaching the updated workflow package. Much appreciated.

 

You have understood my requirement perfectly and your solution is working great. When I run your package file, it gives the correct output for the first file. But unfortunately, I am not being able to figure out, how to run it automatically for the second and third files in the folder.

 

I can understand the logic that you have explained for the Directory Tool, so that it has a list of all the files in it, which would be used for repeating the workflow for each of those files. But I am not able to run it repeatedly on my machine. It just runs for the very first file only.

 

I am new to Alteryx and I am learning it on a daily basis and I can see that I am missing some very basic thing here, but not being able to figure that out. Can you please tell, how to make it run repeatedly.

 

Once again, thanks a lot.

 

I have attached the screenshot of the Workflow Messages, that I receive on my machine, in case it helps you to figure out, why I am not being able to run it repeatedly.

 

20201206 155506.png

StockMarket
8 - Asteroid

@AkimasaKajitani 

 

Thank you so much for your help. I am able to import the package attached by @AngelosPachis but when I try to import your package file, it throws up these error messages -

 

20201206 155810.png

 

 

 

 

20201206 155839.png

 

Could you please tell what should I do to import your package successfully into my machine and run it for testing purpose?

 

Also, looking at your attached snapshot, it seems you have used a different approach compared to AngelosPachis, who has used a directory tool to get the full path of the files and then use this FORMULA to reduce it to the csv files names only -

 

REGEX_Replace([FullPath], ".*\\", "")

//Replace everything that comes before your final back slash with nothing, so you only have a list of all the filenames in that directory

 

 

Whereas you do not seem to have used any such formulas in your snapshot. Does it mean that you are able to get the file names from the folder, without using the above formula or what?

 

And secondly, you seem to have add another Control Parameter Step at the very end, for specifying different output folder locations for the output "if needed". Although in my case, all output files will go into the same location, but it is always nice to learn new tricks.

 

Thanks and best regards.

AngelosPachis
16 - Nebula

Hi @StockMarket ,

 

To make it run for all your files, you should run the workflow file that contains the batch macro and not the batch macro itself. So instead of running the Batch_Macro.yxmc file, try running the Batch_Macro.yxmd.

 

When running the batch macro (as you have done in the screenshot you have attached) then the workflow will run for a single file and will help you structure your macro so you can get the desired output.

 

However, if you want to run it for multiple files, the workflow that you should run is the one containing the batch macro, the one with the directory tool, because then and then only your different file names will be fed in your batch macro.

 

So to solve this, try running the workflow that looks like the first image I have attached in my prior post and see if you get all desired outputs.

 

Let me know if that worked for you.

 

Cheers,

 

Angelos

AngelosPachis
16 - Nebula

Also, to answer on why I have used a formula tool with a regex expression ; as @AkimasaKajitani has done, you don't need to use this formula tool and you can have the whole filepath being fed into the batch macro.

 

It's a thing of preference, I thought that in this way I would give a cleaner view and it would be easier to explain what exactly is going on.

 

Both ways are valid in my opinion.

 

 

AkimasaKajitani
17 - Castor
17 - Castor

Hi @StockMarket 

 

YXZP file often output errors, but you can ignore them.

You may click "Yes" at Workflow Import Progress Window and click "OK" at Workflow Dependencies.

After that "workflow_AK" folder will be made and put your input files into the folder.

It will be work.

 

About File path, putting only the file name in the Input Data tool means that the folder is predetermined in the macro (in @AngelosPachis 's approach, he only rewrites the file name in the Action tool).
My approach is to rewrite the entire full path, and you can run it against the files in any folder you want.
There is a difference between making it generic and making the path deterministic.

But I think it's user's preference.

 

 

StockMarket
8 - Asteroid

@AngelosPachisthanks for the clarifications, now I am able to run the workflow properly.

 

@AkimasaKajitani I am able to run your workflow as well, after following your instructions. And I totally agree to your point that making a solution generic, will allow to use it with any other folders and files etc. in the future. So I would also personally prefer the generic approach, rather then making it fixed. Thank you so much for showing this new approach to me.

 

I would really like to learn more about this method of keeping the file paths etc. as generic in most of our workflows.

Can someone please suggest me some links where I can learn more about it? Any more examples of such workflows that have been posted into the community earlier? Or maybe some other links, where such TIPS are given, to make the workflow as efficient and generic as possible.

 

Thanks to both of you for the great help.

 

Best Regards

 

davidnolan
7 - Meteor

Hello @AkimasaKajitani ,

 

How're you?

 

Apologies to bring you back to this thread but I have a similar issue to the original post and I found your solution to be very helpful and easy to follow. I need to accomplish the same task where I need to run a workflow on multiple different files that are all in one folder separately but my issue is that my files are .XLSX rather than .CSV. When I change the inputs in the model to .XLSX and follow the steps in your solution rather than getting multiple different output files I get one output file with multiple different sheets. So if I had five .XLSX files what steps would I need to change to get five .XLSX outputs, rather than one output with five sheets?

 

Any help here would be massively appreciated! Thanks.

Labels