Suppose you have a folder containing multiple Excel workbooks, each containing multiple sheets or tabs that you would would like to read into a workflow all at once. What are your options?
Brute Force: Use one input tool per sheet, followed by a Union tool to bring all of the data together.
This can work alright if you have a couple workbooks with a couple sheets, but, even then, is a messy solution that I wouldn't recommend.
Dynamic Input: Outlined here.
This works well if you have a single workbook with a known list of sheets all with identical schemas. However, despite using a tool with "dynamic" in its name, this isn't a fully dynamic process as it will not handle schemas changing and requires you to know the list of sheet names.
Let's outline what we want in a solution:
- Capable of reading all sheets/tabs from a workbook, regardless of schema
- Capable of reading all workbooks from a folder
What else do we need to consider?
There are a few different types of Excel workbooks to consider, I'll designate them by extension:
XLS: This extension is getting dated as Excel 2003 was the last version of Excel to save in this format. It's a binary format and requires Alteryx to use the Microsoft Jet driver to read.
XLSX: This is the extension used primarily by Excel 2007+. It's actually a zipped up set of XML documents.
Alteryx has two ways of reading these files:
XLSX Legacy - Relies on the Microsoft Access 2010 drivers
XLSX - We built a native reader for xlsx for our 9.5 release. This is the recommended method for reading XLSX files going forward
XLSB/XLSM: These are additional Excel 2007+ output options that are sometimes used. They are not officially supported filetypes at this time - but the XLSX Legacy driver is able to read them.
There are actually numerous blog posts and community threads about how to read multiple Excel files and sheets into Alteryx. As an 8 year veteran user of Alteryx, its awesome to see the community and online resources start to hit the critical mass where a quick search will yield some answers.
It should be possible to design a process that will give the user the ability to read every sheet in every excel workbook (regardless of extension). Let's build it!
We'll need to construct 3 primary macros for the process.
File Finder: The first step is to find all of the files that need to be opened and will be the "parent" macro and face of the final product.
Sheet Finder: The second step is to find all of the sheets in each file.
Sheet Reader: The third and final step is to extract the data found on each sheet in each workbook and bring them all together.
In part 2, I will go in depth into each of these macros as well as the steps for wrapping them all together into a single, parent macro interface. You can download the latest version of macro from the gallery. Here's a sneak peak of its features: