community
cancel
Showing results for 
Search instead for 
Did you mean: 

Engine Works Blog

Under the hood of Alteryx: tips, tricks and how-tos.
Alteryx Alumni (Retired)

Excel_File_Opener.png

Suppose you have a folder containing multiple Excel workbooks, each containing multiple sheets or tabs that you would would like to read into a workflow all at once. What are your options?

 

 

Brute Force:  Use one input tool per sheet, followed by a Union tool to bring all of the data together. 

This can work alright if you have a couple workbooks with a couple sheets, but, even then, is a messy solution that I wouldn't recommend.

 

Dynamic Input: Outlined here.  

This works well if you have a single workbook with a known list of sheets all with identical schemas.  However, despite using a tool with "dynamic" in its name, this isn't a fully dynamic process as it will not handle schemas changing and requires you to know the list of sheet names.

 

Let's outline what we want in a solution:

 

  1. Capable of reading all sheets/tabs from a workbook, regardless of schema
  2. Capable of reading all workbooks from a folder

What else do we need to consider?

 

There are a few different types of Excel workbooks to consider, I'll designate them by extension:

 

XLS: This extension is getting dated as Excel 2003 was the last version of Excel to save in this format.  It's a binary format and requires Alteryx to use the Microsoft Jet driver to read. 

 

XLSX: This is the extension used primarily by Excel 2007+.  It's actually a zipped up set of XML documents.  

Alteryx has two ways of reading these files:

        XLSX Legacy - Relies on the Microsoft Access 2010 drivers

 

        XLSX - We built a native reader for xlsx for our 9.5 release.  This is the recommended method for reading XLSX files going forward

 

XLSB/XLSM: These are additional Excel 2007+ output options that are sometimes used.  They are not officially supported filetypes at this time - but the XLSX Legacy driver is able to read them. 

 

There are actually numerous blog posts and community threads about how to read multiple Excel files and sheets into Alteryx.  As an 8 year veteran user of Alteryx, its awesome to see the community and online resources start to hit the critical mass where a quick search will yield some answers.  

 

It should be possible to design a process that will give the user the ability to read every sheet in every excel workbook (regardless of extension).  Let's build it!

 

We'll need to construct 3 primary macros for the process.

 

File Finder: The first step is to find all of the files that need to be opened and will be the "parent" macro and face of the final product.

Sheet Finder: The second step is to find all of the sheets in each file.  

Sheet Reader: The third and final step is to extract the data found on each sheet in each workbook and bring them all together.  

 

In part 2, I will go in depth into each of these macros as well as the steps for wrapping them all together into a single, parent macro interface. You can download the latest version of macro from the gallery. Here's a sneak peak of its features:

Read_All_Excel_Files_screenshot.png

Comments
Meteoroid

Excellent - Will check it out. In particular, been looking for pulling in multiple XLSX and XLSB extracts that have data starting in different locations (row 40, next row 20) etc. Using combos of dynamic & sample tools. Happy Holidays.

Bolide

Nice!  Just tested it and worked wonderfully.  Will make my task, combining files from 10 volunteers that have the same file setup, much easier.  Thanks!

Bolide

Thank you for making this, Cameron!

 

One thing I noticed - it doesnt seem to work with network file paths such as those that start with \\. When running Alteryx in a VM, often even your local drives are represented to windows as "network" drives. A future enhancement to properly utilize these paths would be fantastic!

 

 

Best regards,

Ryan

Alteryx Partner

Thanks Cameron! This is an awesome macro!

 

Just wanted to check if it's possible to/how do I tweak this macro such that it only extracts data from row 6 onwards from each excel input file (i.e. to ignore any data from rows 1 to 5 because they contain other main headers and details which I don't really need in the consolidated output file). In terms of format, each excel input file has been formatted similarly (i.e. data needed only begins from row 6 onwards) 

 

Many thanks!

 

 

Alteryx
Alteryx

Hello,

 

You will need to open the macro called 'Sheet Reader' which is within the Excel File Opener Macro.  It is a simple batch macro that will just open up excel workbooks based on the paths that are being fed into it.  You will need to add a Sample tool to skip 5 records after the input data tool then follow it up with a dynamic rename to ensure that the first row of data contains the column headers.

community batch fix.png

 

 

I am not able to attach files to this post so have responded to your DM with the workflow itself.

 

Thanks,

Nick 

Alteryx Partner

Thank you for the vibrant Alteryx community for all the help.

Also, could this and other such tools/packages and macros be included in Alteryx designer releases?

How else can we share such workflows between teams if the other members have not downloaded and installed some of these tools on their desktop versions? Is there an easy way?

Atom

Great Macro and definitely helpful. 

I am however facing some issues trying to read .xlsb files as I am getting the error:

"Error: Excel File Opener (5): Tool #3: Record #2: Tool #26: The external program "..\SupportingFiles\unzip.exe" returned an error code: 9"
and can not figure my way around this. 

Anybody got any fixes for that error? 

Thanks, 

 

** Solved **

 

Issue is from Excel not the Macro, I was getting an error from Excel (pop-up window) everytime I ran the Macro. 
Think it is a driver thing, not to sure though. 

Hello, I just used it, it works properly. Thank you.

 

I need one more functionality. My dataset contains data in its first raw. Can I set it up as "First raw contains data" in any way?

 

Thank you.

What a great gift! It is really helpful for my task. However i am facing a issue like below. When the sheet name include Chinese characters, they will be identified as invalid characters then output nothing. Only when i edit the sheet name without Chinese, it can work smoothly. Could any expert feel free to take a look on this? Waiting for your good news. Btw, my file format is .xls. Picture1.png

Labels