In part 1, I outlined the high level steps needed for an Alteryx macro that will read any number of Excel worksheets from any number of workbooks.
File Finder: The first step is to find all of the files that need to be opened and will be the "parent" macro and face of the final product.
Sheet Finder: The second step is to find all of the sheets in each file.
Sheet Reader: The third and final step is to extract the data found on each sheet in each workbook and bring them all together.
Here, I'll go in depth into each of those steps, including how to wrap them all together into a single, parent macro to streamline the interface.
File Finder:
Let's start from the top and work down.
Sheet Finder:
This will be a batch macro where each batch will be tied to a record produced by the Directory tool in the File Finder parent macro. We'll use a control parameter interface tool for this, labeled as "FullPath", and will map the FullPath field output by the directory tool to this parameter. This will allow us to reference the file path for the workbook relevant to each batch.
The guts of the Sheet Finder macro will need to produce a list of sheets for the file a given batch is working with. This will end up being the most complicated piece of this entire process because of the multiple types of Excel files we are supporting.
Now that we have the core logic of the SheetFinder batch macro figured out, we need to construct the macro interface and actions. Remember that, because this is a batch macro, we will be working with a single Excel workbook at a time. We want each batch of the macro to run the sheet extraction logic appropriate to that batch's Excel workbook file type. To do this, we've placed each section in a tool container so that they can be disabled and enabled as needed.
Action tools will allow us to update the enable/disable status for a container tool based on the extension of the Excel workbook passed through the control parameter for each batch:
Similarly, conditions and actions can be used to update the file path referenced by the appropriate Input Data Tool or Run Command Tool:
This set up is repeated for all four Excel workbook extensions (XLSX, XLSB, XLSM, and XLS)
The results can then be brought together using a series of Detour End tools followed by appending the filepath of the Excel workbook associated with the current batch before being passed back to the parent workflow via a Macro Output Tool.
After saving the Sheet Finder macro we can now insert it into the File Finder parent macro where we can see that it is producing a record for each sheet found in each workbook.
Sheet Reader
Now that we have our list of all sheet names for all excel files in the directory, we need to read them. If we knew that all sheets would have identical schemas to one another, we could use the dynamic input tool for this. However, that is often not the case and we would prefer the macro work with any combination of schemas. Another couple of blog posts by Ned outline how to do this for supported file types so I won't go over it in detail. However, a key change was needed to ensure that the process will work with the unsupported file types .xlsm and .xlsb.
Ned's macro uses the FileFormat value of "-1" for the input data tool in his helper macro - this works great for his process because it instructs the input data tool to look at the file extension to determine appropriate driver to use for reading each file. This would work for the xls and xlsx files, but would not for xlsm or xlsb because there is no preset way for the input data tool to read them. Instead, we need to instruct the Input Data Tool to read xlsm and xlsb files using the Legacy Excel Driver - which is designated by the FileFormat value of "50". A switch function in an action tool lets us easily specify a default FileFormat value of "-1" that is over ridden when certain file extensions are found.
File Finder:
Now we can put everything together in the File Finder parent macro to see how it works.
Notice the file outputs listing out each sheet from each workbook as they are read.
Now that we are satisfied that the workflow is performing as desired, we can wrap it up by constructing the parent level macro interface that our users will interact with.
There are a few key things we'll want to expose to the end user.
Now we can use the macro as a dynamic way to import any number of Excel worksheets contained in any number of workbooks within a given directory.
The output is the union of all the sheets from all of the workbooks in the corresponding folder. Testing on the Alteryx sample directory produces something like this where you can see the filename for every record - where the schemas are mismatched but still read.
So there you have it - a macro to import the data from any number of Excel Worksheets in any number of Excel Workbooks regardless of type. Hopefully - this outline of the my thought process in constructing it will help you with your own macro building endeavors. Maybe you even have an idea for additional features for this macro that you want to incorporate. If you do - please share the results in the comments!
Note: This macro has been heavily tested and is expected to work in the cases mentioned above. We'd love to hear about your experience in working with it, but please keep in mind it is not an official Alteryx product macro.
Edit 3/29/2016:
I've added a few additional options to the version of the macro I'm currently using - updating the attached version to include these changes.
1. Option to specify how the directory tool searches for excel workbooks - primarily to allow for only including files with specific prefixes or suffixes.
2. Option to specify a specific list of sheet names to include.
Edit 3/30/2017
I've published the latest version of the macro to the gallery. Including the following changes:
1. Option to use the xls vb reading macro for all Excel file types (fix for some users reporting issues with reading xlsm files)
2. Option to return the list of fields for each sheet without reading all of the data (I find this useful to gain a quick understanding of how well the data will match up)
3. Changed the directory tool to read all file types as some users reported issues with certain types of file systems. A filter is applied to limit to only excel files.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.