Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Repeat - Append CVS Files Workflow - Folder by Folder - on All Folders in Subdirectory

Aakash55
7 - Meteor

I have searched the community and have found the examples of how to append all the CSV files in a single folder, or even CSV files present inside all the subfolders.

 

But my requirement is different and I have not been able to find out any examples yet.


This workflow example is for - Appending all CSV files, which are present inside a folder, into a single csv file. And how to repeat this process for all the sub folders that are present inside the parent folder.

 

I have to run this workflow on csv data files, which are quite large in size.

and I have to run this flow onto hundreds of such csv files present within the same folder and then repeat the same for the next folder and so on.

 

The good news is that all these files have the exact same structure and format and a constant schema throughout.

All csv files are located in this same folder - "C:\Data\Batch Macro\Append CSV\Data Input Directory".
The sub folders are named according to the dates in "YYYYMMDD" format for "Year Month Date" like "20230122" for the date of 22 Jan 2023 and so on.

 

To make it easy for you, I have attached the sample csv files , on which this workflow needs to be run.

Although the actual data files are quite big, having millions of rows in it, but I have made a very small sample out of it for sharing here.
All this data has a constant schema throughout, having same number of fields and same type of data.

 

Please have a look at the attached sample csv data files first.

Then the final output for this Batch Macro, based on attached sample data would give the output in the form of 3 Big Size CSV Files, like this -

20210426.csv - which will have all the data from these 3 csv files - a.csv, b.csv, c.csv present within that folder.

20210427.csv - which will have all the data from these 3 csv files - x.csv, y.csv, z.csv present within that folder.

20210428.csv - which will have all the data from these 3 csv files - 1.csv, 2.csv, 3.csv present within that folder.

 

I have also attached the Alteryx Worflow Package .yxzp file, so that you can easily open it up on your machine. I have used the latest version of Alteryx for creating this workflow. You may please have a look.


Thanks a lot


PS: I am not being able to attach all the CSV files to this post. You may simply rename the attached CSV file as  x.csv, y.csv, z.csv and 1.csv, 2.csv, 3.csv to recreate the example at your end.

 

 

 

 

A.png

 

 

 

 

 

 

B.png

 

 

 

10 REPLIES 10
Aakash55
7 - Meteor

I am posting the remaining  CSV Files as well -

 

These x.csv, y.csv, z.csv  are to be kept into the folder -  20210427

Aakash55
7 - Meteor

And these 1.csv, 2.csv, 3.csv are to be kept into the folder - 20210428

 

 

 

binuacs
21 - Polaris

@Aakash55 have you checked the include subdirectories option, if you check that option all the files inside the subdirectories will be accessed in one go

 

binuacs_0-1682604162570.png

 

Aakash55
7 - Meteor

Thanks for your reply @binuacs 

 

As I already mentioned, I have tried this option, but what it does is that it appends all the CSV files from ALL the folders, whereas I have to append all the CSV files from ----------------------- EACH FOLDER SEPARATELY ---------------- and I am not being able to find a method by which I can create such a batch macro, which will run folder by folder and give the expected output.

 

If anyone has such an example of batch macro, which runs folder by folder and process the csv files present within those folders, then please share such examples, as that would be really helpful.

 

Thanks a lot.

binuacs
21 - Polaris

@Aakash55 So you need to 3 output files which are combined from each folders?

Aakash55
7 - Meteor

Yes, the 3 output CSV files will have the appended data from the respective folders and they will get their file name from the respective folder names itself.

 

I have attached the expected output files as well.

 

Thanks a lot.

 

 

 

 

binuacs
21 - Polaris

@Aakash55 I created two batch macros for this use case. The output files will be created in a new folder called output/filename.csv, If you want to create in a different location you can update the path in macro 'CSV Macro.ymc'. Let me know if you face any issues

 

binuacs_0-1682688786468.png

binuacs_1-1682688982417.pngbinuacs_2-1682689006685.png

 

Aakash55
7 - Meteor

Thank you so much @binuacs 

 

If possible, can you please explain a little bit about the LOGIC that you have used to achieve this result. Why you designed it to work this way, by using two different batch macros? Is it necessary to use 2 macros, or could it be done in any easier way as well. 

 

Thanks a lot.

 

Best Regards

binuacs
21 - Polaris

@Aakash55 I had to write two batch macros because your use case is complex, the requirement was to read each directly separate, and combine all the files into one csv, that's why I  have to create two batch macros.

 

Let me explain what I did to achieve this

 

Step 1-  The directory tool reads all the directories listed under the main folder "\Data Input Directory\"

binuacs_0-1683151651416.png

Since your requirement was to process each sub-directory separately I had to process one sub-directory at a time. For that, I created a batch macro named CSV_Macro_1. To extract only one sub-directory at a time I took the distinct sub-directories using the summarise tool 

 

I created filter tool because the directory tool picks some other files other than the one which I wanted to process. Since the output files also created under the main directory , if you run next time these output files will be also extracted the directory tool, thats why I used the filter tool

binuacs_1-1683151978626.png

 

 

The summarize tool only passes the unique sub-folder name to the batch macro

binuacs_2-1683152092281.png

 

 

The first bach macro reads all the files from the first directory and passing to the next batch macro, which combines all the records to once csv file

binuacs_3-1683152273093.png

The second macro just combine all the files from the first sub directory and giving the output back to the first macro, then in the first macro all the combined files saved into one csv.

 

This process will happen to all the three sub-directories and output 3 csv files.

 

Note that the first macro just giving each file names in the sub-folder to the second macro , and the second macro combines all the records

 

binuacs_4-1683152606815.png

also, the formula just takes the path from the input file and excludes the file names, and then appends to a folder '\output\' (which you have to create manually before running the workflow under each subfolders) , one of your requirement was the new file should be created with the dates , so the below second regex formula takes care of that part, it extract the sub-directory name from the file path and appending to the main file path

 

 

//'update your path here'
REGEX_Replace([FileName2], '(.*)\\.*', '$1')+'\output\'+REGEX_Replace([FileName2], '(.*)\\(.*)\\.*', '$2')+'.csv'

 

eg: REGEX_Replace([FileName2], '(.*)\\.*', '$1') - gives the result C:\Users\binua\Downloads\Data Input Directory\

REGEX_Replace([FileName2], '(.*)\\(.*)\\.*', '$2') - gives the result 20210427

the above whole formula gives the result 

 

C:\Users\binua\Downloads\Data Input Directory\20210427\output\20210427.csv

 

Please let me know if you need further explanation

 

Labels
Top Solution Authors