Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Reading entire folder of Excel files at once, columns not always the same

DanWhalen
7 - Meteor

I have a folder of Excel sheets. Every month, everything in the folder is deleted, and a new set of Excel sheets are dumped in.  Whenever that happens, I need to condense all these sheets together into one single data file.

 

The Excel files do not follow a naming convention. The number of sheets that go into this folder each month changes. The columns are typically mostly the same throughout the sheets, but not always.  Each sheet has only one tab (mercifully).

 

I want to read all the files in together at once, append on same columns where ever able, and fill in Nulls everywhere else as needed.

 

EXAMPLE

 

---HAVE---

 

~DATA 1~

ColA ColB ColC
10 9 3
4 4 10
7 9 4
1 4 3
5 8 5
9 4 6
2 7 5
10 8 2
6 6 7
6 6 2


~DATA2~

ColA ColB
20 19
17 18
18 16
18 16
15 11
13 18
16 16
20 17
15 11
11 15

 


---WANT---

 

ColA ColB ColC
10 9 3
4 4 10
7 9 4
1 4 3
5 8 5
9 4 6
2 7 5
10 8 2
6 6 7
6 6 2
20 19 [Null]
17 18 [Null]
18 16 [Null]
18 16 [Null]
15 11 [Null]
13 18 [Null]
16 16 [Null]
20 17 [Null]
15 11 [Null]
11 15 [Null]

 

 

I've tried a few different ways so far:

  1. If I import each sheet individually (using one Input Data tool per sheet), and then use the Union tool, it does what I want perfectly. Problem is, I have to manually set up an Input Data tool for every single Excel file, one at a time (then feed those inputs all into the same Union).
  2. If I use one Input Data tool for everything, but specify a path ending with a wildcard, it grabs all Excel sheets in the folder ("S:\PATH\TO\FOLDER\*.xlsx").  Which is great, it fixes my dynamic number of sheets/names of sheets issue. However, problem is, that "one input tool for all files" approach doesn't handle the differing number of columns correctly (looks like it will only pull in the sheets with headers matching whichever was the first file it read in).


I've written a Python program that handles everything exactly as I want (below). I think I can use this script in a pinch, but I'd rather use native Alteryx if possible.

  

 

PYTHON SCRIPT

import pandas as pd
from os import walk

 

folder = "C:/Users/Desktop/test/"

 

input_files = []
for (dirpath, dirnames, filenames) in walk(folder):
   for file in filenames:
      if file.endswith('.xlsx'):
         input_files.append(folder+file)

 

df = pd.DataFrame()
   for i in input_files:
      d = pd.read_excel(i,dtype=str)
      df = pd.concat([df,d],axis=0, ignore_index=True, sort=False)

 

print(df)

3 REPLIES 3
AmeliaG
Alteryx
Alteryx

Hi @DanWhalen,

 

Thanks for your question! You will need to use a batch macro to achieve this. Please take a look at the two links listed below which describe in detail how to achieve this:

 

Reading in Multiple files with different fields

Read in Multiple Excel files with multiple sheets

 

Hope this helps!

 

Amelia

CarlDi
Alteryx
Alteryx

Hi @DanWhalen

 

A batch macro should do the trick. Here's a great article with an example batch macro to input CSVs with different schemas.

https://community.alteryx.com/t5/Alteryx-Knowledge-Base/CS-Macro-Dev-Reading-in-Multiple-Files-with-...

 

In your case, you have excel files, so I went ahead and changed the File Format of the Input Data tool within the Batch macro.

 

Batch_macro.PNG

 

Back to the canvas, I then have a Directory tool pointed to the folder with the excel files. They originally had the same schema but I added unique field names and content in three workbooks. I then inserted my batch macro and selected "Fullpath" in the control parameter. As you can see, it brought in all the fields despite having different schemas.

 

workflow.PNG

Hope that helps! 

 

chpoland
7 - Meteor

@DanWhalen

 

My approach would be to use a directory tool reading all .xls files, pass that into an Input Data batch macro with the key being that you use the interface property setting of "Auto Configure by Position (Wait Until All Iterations Run)". If you don't update the output mode, it will complain of the files not having the same schema.

 

The attached workflow will stack/union all of the data regardless of count of columns in your desired output format with [null] for data rows with fewer columns.

 

Capture1.PNG

 

Batch macro with the only tool being the Input Data tool

Capture2.PNG

 

Access the Macro Interface settings by opening the macro, View -> Interface Designer then clicking the Gear Icon on the left

Capture3.PNG

 

Hope this is what you were looking for!!

 

Labels