I have a folder of Excel sheets. Every month, everything in the folder is deleted, and a new set of Excel sheets are dumped in. Whenever that happens, I need to condense all these sheets together into one single data file.
The Excel files do not follow a naming convention. The number of sheets that go into this folder each month changes. The columns are typically mostly the same throughout the sheets, but not always. Each sheet has only one tab (mercifully).
I want to read all the files in together at once, append on same columns where ever able, and fill in Nulls everywhere else as needed.
EXAMPLE
---HAVE---
~DATA 1~
ColA ColB ColC
10 9 3
4 4 10
7 9 4
1 4 3
5 8 5
9 4 6
2 7 5
10 8 2
6 6 7
6 6 2
~DATA2~
ColA ColB
20 19
17 18
18 16
18 16
15 11
13 18
16 16
20 17
15 11
11 15
---WANT---
ColA ColB ColC
10 9 3
4 4 10
7 9 4
1 4 3
5 8 5
9 4 6
2 7 5
10 8 2
6 6 7
6 6 2
20 19 [Null]
17 18 [Null]
18 16 [Null]
18 16 [Null]
15 11 [Null]
13 18 [Null]
16 16 [Null]
20 17 [Null]
15 11 [Null]
11 15 [Null]
I've tried a few different ways so far:
I've written a Python program that handles everything exactly as I want (below). I think I can use this script in a pinch, but I'd rather use native Alteryx if possible.
PYTHON SCRIPT
import pandas as pd
from os import walk
folder = "C:/Users/Desktop/test/"
input_files = []
for (dirpath, dirnames, filenames) in walk(folder):
for file in filenames:
if file.endswith('.xlsx'):
input_files.append(folder+file)
df = pd.DataFrame()
for i in input_files:
d = pd.read_excel(i,dtype=str)
df = pd.concat([df,d],axis=0, ignore_index=True, sort=False)
print(df)
Solved! Go to Solution.
Hi @DanWhalen,
Thanks for your question! You will need to use a batch macro to achieve this. Please take a look at the two links listed below which describe in detail how to achieve this:
Reading in Multiple files with different fields
Read in Multiple Excel files with multiple sheets
Hope this helps!
Amelia
Hi @DanWhalen
A batch macro should do the trick. Here's a great article with an example batch macro to input CSVs with different schemas.
In your case, you have excel files, so I went ahead and changed the File Format of the Input Data tool within the Batch macro.
Back to the canvas, I then have a Directory tool pointed to the folder with the excel files. They originally had the same schema but I added unique field names and content in three workbooks. I then inserted my batch macro and selected "Fullpath" in the control parameter. As you can see, it brought in all the fields despite having different schemas.
Hope that helps!
My approach would be to use a directory tool reading all .xls files, pass that into an Input Data batch macro with the key being that you use the interface property setting of "Auto Configure by Position (Wait Until All Iterations Run)". If you don't update the output mode, it will complain of the files not having the same schema.
The attached workflow will stack/union all of the data regardless of count of columns in your desired output format with [null] for data rows with fewer columns.
Batch macro with the only tool being the Input Data tool
Access the Macro Interface settings by opening the macro, View -> Interface Designer then clicking the Gear Icon on the left
Hope this is what you were looking for!!