ACT NOW: The Alteryx team will be retiring support for Community account recovery and Community email-change requests Early 2026. Make sure to check your account preferences in my.alteryx.com to make sure you have filled out your security questions. Learn more here
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Batch macro to read 1000+ .xlsx files with varying schemas

LizPerry
7 - Meteor

I have a flow with a Directory input to read in all files from a network folder, I use a batch macro to find all sheet names across all of the files & then another batch macro to bring the data into the flow from each of those files, sheets.

Ideally I'd like to create a unioned view of the data BUT I'm struggling with the variances in format. All submissions should be the same format but various customers don't / won't comply. I'm happy to reject a number of submissions but some are simple fixes such as, the data doesn't start on row 1 so I need to look for row headings OR row headings are slightly different. 

These are examples of some of the variances I'd like to fix. The batch macro which picks up all data from all sheets is failing when I bring it into my flow.  The actual headers should be Distributor Account Number, Distributor Name, Date of Purchase, Customer Name, Material Reference Number & Quantity Sold but if this appears on rows 1, or even on rows 6,7 etc I want to start my data collation from there and if the headers say something like Dist Acc No, I want to accept that. 

4 REPLIES 4
EKasminsky
9 - Comet

I'm not sure how helpful this is but I will put in my two cents.

I had a similar issue like this with multiple customers. What I did was I imported all of the customers directory of invoices separate from the other customers. So say I had Company A and Company B. Company A would be pulled in with one directory tool and then a macro to find sheet names, then one to import data. A separate process would be done with Company B. The reason I did it like this is because these individual companies tended to always send their invoices in the same manner. I could usually expect what the tab names and column names were. 

I also use a Dynamic Rename for column names. This is just to remove any possible problem spaces in column names.

EKasminsky_0-1766000478598.png

All of this makes a bigger workflow... but might help reduce the number of issues you are seeing or help.

 

caltang
17 - Castor
17 - Castor

Here's how I would do it @LizPerry 

 

image.png

 

The concept is to use a batch macro to load the files in from a directory to get their sheet names. You can then filter for the sheets you want if they follow a pattern, for my case, I just used all your sheets.

 

The second step is to load the data from said sheets into the canvas using the same macro, but with a twist. You can see that I made the formula tools before the same macros to get the List of Sheet Names, and to replace said List of Sheet Names so that my macro can read the sheet info and the sheet data inside all in 1 go with a change of formula.

 

Then, we order the data and structure it, add a tile tool to identify each file called in, and then build a batch macro like so:

 

image.png

 

This batch macro basically standardizes the input, and orders it to become a proper table (removing null columns) and then positioning them to be dynamically renamed like what @EKasminsky said. 

 

But here comes the extra twist... I added another control on top of the original control parameter, which takes in this text input information from the main workflow here:

 

image.png

 

Now, you can expand this list by adding to the text input yourself in the configuration panel on the left, or you can replace it with an .xlsx or .csv file as a mapping file. The idea here is to tell Alteryx that the headers within the datasets contain these key words, and as long as they contain these key words, we want them to be taken in as the 1st row before a dynamic rename.

 

This prevents scenarios like:

image.png

 

Where the first row is clearly not the header, but if we use the same method without accounting for the header's keywords, then we will get an issue. 

 

Of course, I don't know how many keywords are there in your 1000 files, and I have controlled for them to get the first row that appears (in this case will be the main header). 

 

Let me know if this works for you.

 

 

Best regards,

Calvin Tang

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
caltang
17 - Castor
17 - Castor

Oh btw, the first directory tool you can make it like mine:

image.png

 

Store all your example files in a folder and load it from there. 

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
flying008
15 - Aurora

Hi, @LizPerry 

 

01- I assume you have already inputted all the data using batch macro [by field column position rather than by name].
02- Use keywords to identify the desired field name row and distinguish it from other data rows and rows to be deleted.
03- Filter out useless lines (such as the data rows above the actual field names).
04. Transpose by file name and identifier columns to facilitate unified field matching later.
05- Extract the first row based on the ID column, which is the row containing the actual field names.
06- Transpose.
07. Filter out empty lines, i.e., lines that are not common names.
08. Extract unique values, which serves as a preliminary step for manually handling irregular field names. (for below Step 09)
09. Manually match all ambiguous and partially matching field names; this step is crucial.    ★★★

ValueMap
COGSCOGS
Cust NameCustomer Name
Customer NameCustomer Name
Customer nameCustomer Name
Customer NoCustomer No
Distributor Account NumberDistributor Account Number
Distributor NameDistributor Name
Material Ref NumberMaterial Ref Number
Material Reference NumberMaterial Ref Number
ProductMaterial Ref Number
OnlineOnline
DatePurchase Date
Date of PurchasePurchase Date
QtySold Qty
QuantitySold Qty
Quantity SoldSold Qty


10. Match the Map with the original field column names and join them into the transposed data from Step 4.
11 - Filter out empty rows, i.e., non-common name rows.
12 - Filter out the original first row, which is the header row containing column names, to prepare for the next Cross Tab step.
13- Group the data rows for identification, preparing for the next Cross Tab step.
14- Use the Cross Tab tool to restore the data to its original structure.

 

录制_2025_12_18_15_20_46_341.gif

 

Input          
FileNameF1F2F3F4F5F6F7F8  
example file1.xlsx|||'Data$'Customer NoCustomer nameDateQuantityProduct     
example file1.xlsx|||'Data$'123Example Cust01/07/20251ABC123     
example file1.xlsx|||'Data$'123Example Cust07/07/20251ABD456     
example file1.xlsx|||'Data$'123Example Cust31/07/20251ABE789     
example file1.xlsx|||'Data$'123Example Cust29/07/20251DF455     
example file1.xlsx|||'Data$'123Example Cust23/07/20251DREW     
example file1.xlsx|||'Data$'123Example Cust23/07/20251LKOO     
example file1.xlsx|||'Data$'123Example Cust30/07/20251JHUS     
example file1.xlsx|||'Report$'Month:25-Jul testemail@test.com      
example file1.xlsx|||'Report$'          
example file1.xlsx|||'Report$'Distributor Account NumberDistributor NameDate of PurchaseCust NameMaterial Ref NumberQtyOnline   
example file1.xlsx|||'Report$'1234Example Dist01/07/2025Example CustABC1231No   
example file1.xlsx|||'Report$'1234Example Dist07/07/2025Example CustABD4561No   
example file1.xlsx|||'Report$'1234Example Dist31/07/2025Example CustABE7891No   
example file1.xlsx|||'Report$'1234Example Dist29/07/2025Example CustDF4551No   
example file1.xlsx|||'Report$'1234Example Dist23/07/2025Example CustDREW1No   
example file1.xlsx|||'Report$'1234Example Dist23/07/2025Example CustLKOO1No   
example file1.xlsx|||'Report$'1234Example Dist30/07/2025Example CustJHUS1No   
example file2.xlsx|||'Page 1$'Distributor NameDistributor Account NumberDate of PurchaseCustomer NameMaterial Reference NumberQuantity SoldCOGSOnline  
example file2.xlsx|||'Page 1$'Example dist 2124592025-11-3 twrweq433.99No  
example file2.xlsx|||'Page 1$'Example dist 3129872025-11-3 potott424.21No  
example file3.xlsx|||'Sheet1$'Distributor NameDistributor Account NumberDate of PurchaseCustomer NameMaterial Reference NumberQuantity Sold    
example file3.xlsx|||'Sheet1$'example dist in town A7778892025-12-1A customer78999881    
example file3.xlsx|||'Sheet1$'example dist in town B7778882025-12-2A customer6558881    
           
           
Output          
FileNameTile_SequenceNumCOGSCustomer_NameCustomer_NoDistributor_Account_NumberDistributor_NameMaterial_Ref_NumberOnlinePurchase_DateSold_Qty
example file1.xlsx|||'Data$'1 Example Cust123  ABC123 01/07/20251
example file1.xlsx|||'Data$'2 Example Cust123  ABD456 07/07/20251
example file1.xlsx|||'Data$'3 Example Cust123  ABE789 31/07/20251
example file1.xlsx|||'Data$'4 Example Cust123  DF455 29/07/20251
example file1.xlsx|||'Data$'5 Example Cust123  DREW 23/07/20251
example file1.xlsx|||'Data$'6 Example Cust123  LKOO 23/07/20251
example file1.xlsx|||'Data$'7 Example Cust123  JHUS 30/07/20251
example file1.xlsx|||'Report$'1 Example Cust 1234Example DistABC123No01/07/20251
example file1.xlsx|||'Report$'2 Example Cust 1234Example DistABD456No07/07/20251
example file1.xlsx|||'Report$'3 Example Cust 1234Example DistABE789No31/07/20251
example file1.xlsx|||'Report$'4 Example Cust 1234Example DistDF455No29/07/20251
example file1.xlsx|||'Report$'5 Example Cust 1234Example DistDREWNo23/07/20251
example file1.xlsx|||'Report$'6 Example Cust 1234Example DistLKOONo23/07/20251
example file1.xlsx|||'Report$'7 Example Cust 1234Example DistJHUSNo30/07/20251
example file2.xlsx|||'Page 1$'133.99  12459Example dist 2twrweqNo2025-11-34
example file2.xlsx|||'Page 1$'224.21  12987Example dist 3potottNo2025-11-34
example file3.xlsx|||'Sheet1$'1 A customer 777889example dist in town A7899988 2025-12-11
example file3.xlsx|||'Sheet1$'2 A customer 777888example dist in town B655888 2025-12-21

 

*********

BTW, Due to security policy restrictions, I am unable to upload any. yxmd files, so please do it yourself,

 

Labels
Top Solution Authors