I have a flow with a Directory input to read in all files from a network folder, I use a batch macro to find all sheet names across all of the files & then another batch macro to bring the data into the flow from each of those files, sheets.
Ideally I'd like to create a unioned view of the data BUT I'm struggling with the variances in format. All submissions should be the same format but various customers don't / won't comply. I'm happy to reject a number of submissions but some are simple fixes such as, the data doesn't start on row 1 so I need to look for row headings OR row headings are slightly different.
These are examples of some of the variances I'd like to fix. The batch macro which picks up all data from all sheets is failing when I bring it into my flow. The actual headers should be Distributor Account Number, Distributor Name, Date of Purchase, Customer Name, Material Reference Number & Quantity Sold but if this appears on rows 1, or even on rows 6,7 etc I want to start my data collation from there and if the headers say something like Dist Acc No, I want to accept that.
I'm not sure how helpful this is but I will put in my two cents.
I had a similar issue like this with multiple customers. What I did was I imported all of the customers directory of invoices separate from the other customers. So say I had Company A and Company B. Company A would be pulled in with one directory tool and then a macro to find sheet names, then one to import data. A separate process would be done with Company B. The reason I did it like this is because these individual companies tended to always send their invoices in the same manner. I could usually expect what the tab names and column names were.
I also use a Dynamic Rename for column names. This is just to remove any possible problem spaces in column names.
All of this makes a bigger workflow... but might help reduce the number of issues you are seeing or help.
Here's how I would do it @LizPerry
The concept is to use a batch macro to load the files in from a directory to get their sheet names. You can then filter for the sheets you want if they follow a pattern, for my case, I just used all your sheets.
The second step is to load the data from said sheets into the canvas using the same macro, but with a twist. You can see that I made the formula tools before the same macros to get the List of Sheet Names, and to replace said List of Sheet Names so that my macro can read the sheet info and the sheet data inside all in 1 go with a change of formula.
Then, we order the data and structure it, add a tile tool to identify each file called in, and then build a batch macro like so:
This batch macro basically standardizes the input, and orders it to become a proper table (removing null columns) and then positioning them to be dynamically renamed like what @EKasminsky said.
But here comes the extra twist... I added another control on top of the original control parameter, which takes in this text input information from the main workflow here:
Now, you can expand this list by adding to the text input yourself in the configuration panel on the left, or you can replace it with an .xlsx or .csv file as a mapping file. The idea here is to tell Alteryx that the headers within the datasets contain these key words, and as long as they contain these key words, we want them to be taken in as the 1st row before a dynamic rename.
This prevents scenarios like:
Where the first row is clearly not the header, but if we use the same method without accounting for the header's keywords, then we will get an issue.
Of course, I don't know how many keywords are there in your 1000 files, and I have controlled for them to get the first row that appears (in this case will be the main header).
Let me know if this works for you.
Best regards,
Calvin Tang
Oh btw, the first directory tool you can make it like mine:
Store all your example files in a folder and load it from there.
Hi, @LizPerry
01- I assume you have already inputted all the data using batch macro [by field column position rather than by name].
02- Use keywords to identify the desired field name row and distinguish it from other data rows and rows to be deleted.
03- Filter out useless lines (such as the data rows above the actual field names).
04. Transpose by file name and identifier columns to facilitate unified field matching later.
05- Extract the first row based on the ID column, which is the row containing the actual field names.
06- Transpose.
07. Filter out empty lines, i.e., lines that are not common names.
08. Extract unique values, which serves as a preliminary step for manually handling irregular field names. (for below Step 09)
09. Manually match all ambiguous and partially matching field names; this step is crucial. ★★★
| Value | Map |
| COGS | COGS |
| Cust Name | Customer Name |
| Customer Name | Customer Name |
| Customer name | Customer Name |
| Customer No | Customer No |
| Distributor Account Number | Distributor Account Number |
| Distributor Name | Distributor Name |
| Material Ref Number | Material Ref Number |
| Material Reference Number | Material Ref Number |
| Product | Material Ref Number |
| Online | Online |
| Date | Purchase Date |
| Date of Purchase | Purchase Date |
| Qty | Sold Qty |
| Quantity | Sold Qty |
| Quantity Sold | Sold Qty |
10. Match the Map with the original field column names and join them into the transposed data from Step 4.
11 - Filter out empty rows, i.e., non-common name rows.
12 - Filter out the original first row, which is the header row containing column names, to prepare for the next Cross Tab step.
13- Group the data rows for identification, preparing for the next Cross Tab step.
14- Use the Cross Tab tool to restore the data to its original structure.
| Input | ||||||||||
| FileName | F1 | F2 | F3 | F4 | F5 | F6 | F7 | F8 | ||
| example file1.xlsx|||'Data$' | Customer No | Customer name | Date | Quantity | Product | |||||
| example file1.xlsx|||'Data$' | 123 | Example Cust | 01/07/2025 | 1 | ABC123 | |||||
| example file1.xlsx|||'Data$' | 123 | Example Cust | 07/07/2025 | 1 | ABD456 | |||||
| example file1.xlsx|||'Data$' | 123 | Example Cust | 31/07/2025 | 1 | ABE789 | |||||
| example file1.xlsx|||'Data$' | 123 | Example Cust | 29/07/2025 | 1 | DF455 | |||||
| example file1.xlsx|||'Data$' | 123 | Example Cust | 23/07/2025 | 1 | DREW | |||||
| example file1.xlsx|||'Data$' | 123 | Example Cust | 23/07/2025 | 1 | LKOO | |||||
| example file1.xlsx|||'Data$' | 123 | Example Cust | 30/07/2025 | 1 | JHUS | |||||
| example file1.xlsx|||'Report$' | Month: | 25-Jul | testemail@test.com | |||||||
| example file1.xlsx|||'Report$' | ||||||||||
| example file1.xlsx|||'Report$' | Distributor Account Number | Distributor Name | Date of Purchase | Cust Name | Material Ref Number | Qty | Online | |||
| example file1.xlsx|||'Report$' | 1234 | Example Dist | 01/07/2025 | Example Cust | ABC123 | 1 | No | |||
| example file1.xlsx|||'Report$' | 1234 | Example Dist | 07/07/2025 | Example Cust | ABD456 | 1 | No | |||
| example file1.xlsx|||'Report$' | 1234 | Example Dist | 31/07/2025 | Example Cust | ABE789 | 1 | No | |||
| example file1.xlsx|||'Report$' | 1234 | Example Dist | 29/07/2025 | Example Cust | DF455 | 1 | No | |||
| example file1.xlsx|||'Report$' | 1234 | Example Dist | 23/07/2025 | Example Cust | DREW | 1 | No | |||
| example file1.xlsx|||'Report$' | 1234 | Example Dist | 23/07/2025 | Example Cust | LKOO | 1 | No | |||
| example file1.xlsx|||'Report$' | 1234 | Example Dist | 30/07/2025 | Example Cust | JHUS | 1 | No | |||
| example file2.xlsx|||'Page 1$' | Distributor Name | Distributor Account Number | Date of Purchase | Customer Name | Material Reference Number | Quantity Sold | COGS | Online | ||
| example file2.xlsx|||'Page 1$' | Example dist 2 | 12459 | 2025-11-3 | twrweq | 4 | 33.99 | No | |||
| example file2.xlsx|||'Page 1$' | Example dist 3 | 12987 | 2025-11-3 | potott | 4 | 24.21 | No | |||
| example file3.xlsx|||'Sheet1$' | Distributor Name | Distributor Account Number | Date of Purchase | Customer Name | Material Reference Number | Quantity Sold | ||||
| example file3.xlsx|||'Sheet1$' | example dist in town A | 777889 | 2025-12-1 | A customer | 7899988 | 1 | ||||
| example file3.xlsx|||'Sheet1$' | example dist in town B | 777888 | 2025-12-2 | A customer | 655888 | 1 | ||||
| Output | ||||||||||
| FileName | Tile_SequenceNum | COGS | Customer_Name | Customer_No | Distributor_Account_Number | Distributor_Name | Material_Ref_Number | Online | Purchase_Date | Sold_Qty |
| example file1.xlsx|||'Data$' | 1 | Example Cust | 123 | ABC123 | 01/07/2025 | 1 | ||||
| example file1.xlsx|||'Data$' | 2 | Example Cust | 123 | ABD456 | 07/07/2025 | 1 | ||||
| example file1.xlsx|||'Data$' | 3 | Example Cust | 123 | ABE789 | 31/07/2025 | 1 | ||||
| example file1.xlsx|||'Data$' | 4 | Example Cust | 123 | DF455 | 29/07/2025 | 1 | ||||
| example file1.xlsx|||'Data$' | 5 | Example Cust | 123 | DREW | 23/07/2025 | 1 | ||||
| example file1.xlsx|||'Data$' | 6 | Example Cust | 123 | LKOO | 23/07/2025 | 1 | ||||
| example file1.xlsx|||'Data$' | 7 | Example Cust | 123 | JHUS | 30/07/2025 | 1 | ||||
| example file1.xlsx|||'Report$' | 1 | Example Cust | 1234 | Example Dist | ABC123 | No | 01/07/2025 | 1 | ||
| example file1.xlsx|||'Report$' | 2 | Example Cust | 1234 | Example Dist | ABD456 | No | 07/07/2025 | 1 | ||
| example file1.xlsx|||'Report$' | 3 | Example Cust | 1234 | Example Dist | ABE789 | No | 31/07/2025 | 1 | ||
| example file1.xlsx|||'Report$' | 4 | Example Cust | 1234 | Example Dist | DF455 | No | 29/07/2025 | 1 | ||
| example file1.xlsx|||'Report$' | 5 | Example Cust | 1234 | Example Dist | DREW | No | 23/07/2025 | 1 | ||
| example file1.xlsx|||'Report$' | 6 | Example Cust | 1234 | Example Dist | LKOO | No | 23/07/2025 | 1 | ||
| example file1.xlsx|||'Report$' | 7 | Example Cust | 1234 | Example Dist | JHUS | No | 30/07/2025 | 1 | ||
| example file2.xlsx|||'Page 1$' | 1 | 33.99 | 12459 | Example dist 2 | twrweq | No | 2025-11-3 | 4 | ||
| example file2.xlsx|||'Page 1$' | 2 | 24.21 | 12987 | Example dist 3 | potott | No | 2025-11-3 | 4 | ||
| example file3.xlsx|||'Sheet1$' | 1 | A customer | 777889 | example dist in town A | 7899988 | 2025-12-1 | 1 | |||
| example file3.xlsx|||'Sheet1$' | 2 | A customer | 777888 | example dist in town B | 655888 | 2025-12-2 | 1 |
*********
BTW, Due to security policy restrictions, I am unable to upload any. yxmd files, so please do it yourself,