This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have a folder of Excel sheets. Every month, everything in the folder is deleted, and a new set of Excel sheets are dumped in. Whenever that happens, I need to condense all these sheets together into one single data file.
The Excel files do not follow a naming convention. The number of sheets that go into this folder each month changes. The columns are typically mostly the same throughout the sheets, but not always. Each sheet has only one tab (mercifully).
I want to read all the files in together at once, append on same columns where ever able, and fill in Nulls everywhere else as needed.
If I import each sheet individually (using one Input Data tool per sheet), and then use the Union tool, it does what I want perfectly. Problem is, I have to manually set up an Input Data tool for every single Excel file, one at a time (then feed those inputs all into the same Union).
If I use one Input Data tool for everything, but specify a path ending with a wildcard, it grabs all Excel sheets in the folder ("S:\PATH\TO\FOLDER\*.xlsx"). Which is great, it fixes my dynamic number of sheets/names of sheets issue. However, problem is, that "one input tool for all files" approach doesn't handle the differing number of columns correctly (looks like it will only pull in the sheets with headers matching whichever was the first file it read in).
I've written a Python program that handles everything exactly as I want (below). I think I can use this script in a pinch, but I'd rather use native Alteryx if possible.
import pandas as pd from os import walk
folder = "C:/Users/Desktop/test/"
input_files =  for (dirpath, dirnames, filenames) in walk(folder): for file in filenames: if file.endswith('.xlsx'): input_files.append(folder+file)
df = pd.DataFrame() for i in input_files: d = pd.read_excel(i,dtype=str) df = pd.concat([df,d],axis=0, ignore_index=True, sort=False)
In your case, you have excel files, so I went ahead and changed the File Format of the Input Data tool within the Batch macro.
Back to the canvas, I then have a Directory tool pointed to the folder with the excel files. They originally had the same schema but I added unique field names and content in three workbooks. I then inserted my batch macro and selected "Fullpath" in the control parameter. As you can see, it brought in all the fields despite having different schemas.
My approach would be to use a directory tool reading all .xls files, pass that into an Input Data batch macro with the key being that you use the interface property setting of "Auto Configure by Position (Wait Until All Iterations Run)". If you don't update the output mode, it will complain of the files not having the same schema.
The attached workflow will stack/union all of the data regardless of count of columns in your desired output format with [null] for data rows with fewer columns.
Batch macro with the only tool being the Input Data tool
Access the Macro Interface settings by opening the macro, View -> Interface Designer then clicking the Gear Icon on the left