community
cancel
Showing results for 
Search instead for 
Did you mean: 

Engine Works Blog

Under the hood of Alteryx: tips, tricks and how-tos.
Alteryx Alumni (Retired)

In part 1, I outlined the high level steps needed for an Alteryx macro that will read any number of Excel worksheets from any number of workbooks.

 

File Finder: The first step is to find all of the files that need to be opened and will be the "parent" macro and face of the final product.

Sheet Finder: The second step is to find all of the sheets in each file.  

Sheet Reader: The third and final step is to extract the data found on each sheet in each workbook and bring them all together.  

 

Here, I'll go in depth into each of those steps, including how to wrap them all together into a single, parent macro to streamline the interface.

 

File Finder:

Let's start from the top and work down.  

  1. First, we'll need to gather the list of files we'll be working with.  The directory tool is great for this.  You can see in the output the four example files I've created so that I have an example of each file type to work with.  The "*.xls" file specification will read all four file types because they all begin with "xls". 
  2. We'll need to come back to this later to add the Sheet Finder and Sheet Opener macros as well as to build an interface, but the directory tool does everything we need from a data standpoint to get us started.

 Step 1.PNG

 

Sheet Finder:

This will be a batch macro where each batch will be tied to a record produced by the Directory tool in the File Finder parent macro.  We'll use a control parameter interface tool for this, labeled as "FullPath", and will map the FullPath field output by the directory tool to this parameter.  This will allow us to reference the file path for the workbook relevant to each batch.  

 

The guts of the Sheet Finder macro will need to produce a list of sheets for the file a given batch is working with.  This will end up being the most complicated piece of this entire process because of the multiple types of Excel files we are supporting.

 

  • XLSX: These are easy, the new xlsx native reader has an option to output the list of sheet names.  

XLSX Sheet Names.PNG

 

 

  •  XLSB/XLSM: These are a little more difficult as the new driver does not support either file type.  However, borrowing some tricks (plus a few modifications) from our CTO, Ned Harding, outlined here, we can unzip each file type and find the list of sheets in the resulting output.

XLSM XLSB CommandPrompt.PNG

  • XLS: Excel 2003 files proved to be the most difficult for extracting the list of sheet names.  Primarily because they are old, binary files that cannot be easily be unpacked to find a list of sheet names sitting in an XML file.  However, it is possible to call some VB script that will write the list of sheet names in a given .xls file out to a txt file that can then be read directly by Alteryx.  I borrowed the script from a colleague (@RichardS) who already solved this particular step (a quick google search should also turn up some example scripts - but you can find the particular script I used in the supporting files folder of the final package).

XLS CommandPrompt.PNG

Now that we have the core logic of the SheetFinder batch macro figured out, we need to construct the macro interface and actions.  Remember that, because this is a batch macro, we will be working with a single Excel workbook at a time.  We want each batch of the macro to run the sheet extraction logic appropriate to that batch's Excel workbook file type. To do this, we've placed each section in a tool container so that they can be disabled and enabled as needed.

 

Action tools will allow us to update the enable/disable status for a container tool based on the extension of the Excel workbook passed through the control parameter for each batch:

 

Similarly, conditions and actions can be used to update the file path referenced by the appropriate Input Data Tool or Run Command Tool:

 

This set up is repeated for all four Excel workbook extensions (XLSX, XLSB, XLSM, and XLS)

The results can then be brought together using a series of Detour End tools followed by appending the filepath of the Excel workbook associated with the current batch before being passed back to the parent workflow via a Macro Output Tool.

 

After saving the Sheet Finder macro we can now insert it into the File Finder parent macro where we can see that it is producing a record for each sheet found in each workbook.

 

 

Sheet Reader

Now that we have our list of all sheet names for all excel files in the directory, we need to read them.  If we knew that all sheets would have identical schemas to one another, we could use the dynamic input tool for this.  However, that is often not the case and we would prefer the macro work with any combination of schemas.  Another couple of blog posts by Ned outline how to do this for supported file types so I won't go over it in detail.  However, a key change was needed to ensure that the process will work with the unsupported file types .xlsm and .xlsb.

 

Ned's macro uses the FileFormat value of "-1" for the input data tool in his helper macro - this works great for his process because it instructs the input data tool to look at the file extension to determine appropriate driver to use for reading each file.  This would work for the xls and xlsx files, but would not for xlsm or xlsb because there is no preset way for the input data tool to read them.  Instead, we need to instruct the Input Data Tool to read xlsm and xlsb files using the Legacy Excel Driver - which is designated by the FileFormat value of "50".  A switch function in an action tool lets us easily specify a default FileFormat value of "-1" that is over ridden when certain file extensions are found.

 

File Finder:

Now we can put everything together in the File Finder parent macro to see how it works.RunResults.PNG

Notice the file outputs listing out each sheet from each workbook as they are read. 

 

Now that we are satisfied that the workflow is performing as desired, we can wrap it up by constructing the parent level macro interface that our users will interact with.

 

There are a few key things we'll want to expose to the end user.

 

  1. Most importantly, we need them to specify the folder path for the macro to look for Excel files.  Remember that this function is performed by the Directory Tool.  There is a similar Folder Browse Interface Tool that allows us to ask the user for a folder path input.
  2. The Directory Tool has an option that allows us to either include or exclude SubDirectories of the folder browse.  We should probably give the end user this option as well.  This can be done with a Check Box Interface Tool.
  3. We can also give the user the ability to choose which Excel file types are consumed with a List Box Interface Tool and Filter.
  4. Finally, we can copy the Output File Path Drop Down selection from the Sheet Reader macro to enable the end user to specify this value as well.  We will also need a Macro Output tool to provide all of the records read from all of the Excel files outside of the macro.

Now we can use the macro as a dynamic way to import any number of Excel worksheets contained in any number of workbooks within a given directory.

Final.PNG

The output is the union of all the sheets from all of the workbooks in the corresponding folder.  Testing on the Alteryx sample directory produces something like this where you can see the filename for every record - where the schemas are mismatched but still read. 

ResultsOutput.PNG

 

 So there you have it - a macro to import the data from any number of Excel Worksheets in any number of Excel Workbooks regardless of type.  Hopefully - this outline of the my thought process in constructing it will help you with your own macro building endeavors. Maybe you even have an idea for additional features for this macro that you want to incorporate.  If you do - please share the results in the comments!

 

 

Note: This macro has been heavily tested and is expected to work in the cases mentioned above. We'd love to hear about your experience in working with it, but please keep in mind it is not an official Alteryx product macro. 

 

Edit 3/29/2016:

 

I've added a few additional options to the version of the macro I'm currently using - updating the attached version to include these changes.

 

1. Option to specify how the directory tool searches for excel workbooks - primarily to allow for only including files with specific prefixes or suffixes.

2. Option to specify a specific list of sheet names to include. 

 

Edit 3/30/2017

 

I've published the latest version of the macro to the gallery.  Including the following changes:

 

1. Option to use the xls vb reading macro for all Excel file types (fix for some users reporting issues with reading xlsm files)

2. Option to return the list of fields for each sheet without reading all of the data (I find this useful to gain a quick understanding of how well the data will match up)

3. Changed the directory tool to read all file types as some users reported issues with certain types of file systems.  A filter is applied to limit to only excel files.

Comments
Meteoroid

Sheet Finder and Sheet Reader macros extremely useful.

Much obliged.

 

Alteryx Alumni (Retired)

Just awesome!

Alteryx
Alteryx

Wow! You are right about a great holiday gift. Thanks!

Bolide

Thank you, Cameron, for such an awesome tool!

 

Here's one enhancement idea I have made use of: allow the users to choose whether to combine the Excel files by field name (which this process does already) or by field position. To allow users to combine by field position, I simply created a copy of the Sheet Reader macro and changed it's properties in the Interface Designer to "Auto Configure by Position (Wait Until All Iterations Run)". I then replaced the parent level macro with an analytic app that has all the same interface questions as the parent macro, but also has a radio button which will allow users to select the appropriate Sheet Reader macro to run (the user's selection changes the direction of a detour tool).

 

Today I tried using this tool to combine .XLS files. The process works great on my computer's Desktop Designer! It also works great when I remote-login to my company's Alteryx Server and run it from the Server's Desktop Designer. However, it fails to work when I run my analytic app on my company's private Gallery, giving a "File Not Found" error. The Run command tool in the Sheet Finder macro for the .XLS extention says it cannot find the XLStabs#.txt file. Playing aroud with the language in the ExcelTabsScript.vbs file, I discovered that an error occurs on the "Set objWorkbook = objExcel.Workbooks.Open (WScript.Arguments(0))" line. This error causes the rest of the script to be skipped and by extention no text to be written. If I change the script, manually providing the sheet name to write to the text file and removing all code relating to the opening/looking through Excel, the Gallery app works just fine! So here's my question: why would the .XLS Sheet Finder Run Command work great the Server's Desktop Designer but error on the Gallery (which is in the same "box")?

Alteryx Alumni (Retired)

Hi @DultonM,

 

I just tested this on my personal gallery and it ran the without any issues - including opening a few XLS files that would have needed to call the VBA.  So - I'm not sure what's different with your gallery set up, but issues where running through the designer on the server is successful but running through the Alteryx service (gallery or scheduler) is not successful usually turn out to be security or access issues so I would start there. Unfortunately - since that VBA script is nothing Alteryx officially produces or endorses, I can't make any guarantees it will work in every environment.  

 

 

 

 

Bolide

Thank you for the reply, Cameron. My team looked into the access and security settings but didn't turn up a solution to our problem. But then, a colleague of mine, @patrick_digan, discovered a bizarre fix: add an empty folder called "Desktop" to the system profile directories on the Server. For some reason, the mere existence of the new directory allowed the VBS script to function correctly when executed by Alteryx on the Gallery.

 

This blog was where the fix was found (scroll down about a third of the way): https://social.msdn.microsoft.com/Forums/office/en-US/ac2549f5-fc1e-494d-9015-70cb31d3aeb2/opening-e...

Atom

I'm getting the following error when trying to import any .xlsm files:

 

Excel File Opener (5) Tool #3: Record #2: Tool #10: The external program "..\SupportingFiles\unzip.exe" returned an error code: 9

 

When I open the macro and update the directory to my local directory and specify .xlsm, the SheetFinder macro returns the same error above.

 

I tested it with .xlsx and that worked. Anyone have the same issue?

Meteor
Hi, I have exaclty the same problem. I'm trying to use the macro on hundreds of xlsm files but the Excel file opener returns the following error : "Error: Excel File Opener (1): Tool #3: Record #1: Tool #10: The external program "...\SupportingFiles\unzip.exe" returned an error code: 9" Am I supposed to change the parameters and options of the "Run unzip.exe" tool embarked in the Sheet finder macro ? It looks like the "Run command" and "Read results" options should be changed. Thanks.
Bolide

Hi @mschleif and @gmerce,

 

Awhile ago, I took this great process and modified it for my company. I believe I had troubles with the unzip.exe as well so I used the Run Command tool for the XLS option (the one that executes ExcelTabsScript.vbs) as the way to read XLSM and XLSB  files. Actually, the Run Command tool that executes ExcelTabsScript.vbs works for all 4 Excel file types. Here's a picture of my SheetFinder macro:

 

 SheetFinder.PNG

 

Basically I...

A) deleted all the detour tools and the tool containers/interface tools for XLSM, XLSX, and XLSB

B) changed the Condition tool to allow for all 4 types

C) Added a summarize at the end (one of my users was getting workbook/sheet combinations appearing more than once)

 

For XLSX files this approach is slightly slower so maybe you could have the XLSX tool container separate and then use the XLS method for the remaining file types.

 

Let me know if you have any questions! I'm working on getting access to the public gallery so I can post apps instead of just showing pictures.

 

Please star this post if it solves your problem!

Meteor
Did you change the options of your Run command tool ?
Bolide

Nope. Worked right out of the box for me. If you use it right out of the box, make sure you have the same folder structure as @CameronS. The ExcelTabsScribt.vbs should be in a folder called "Supporting Files". That folder should be at the same level as the "Supporting Macros" folder where the SheetFinder macro should be saved. I believe the "..\" in the Command Arguments option  incates 'same folder level'.

 

Capture(2).PNG

 

Otherwise you'll have to explicity specify the path.

Atom
I currently have the most recent version of Alteryx 10.6.6 and this macro is not working when trying to select one sheet out of many in excel workbooks.  It is pulling in all sheets, regardless of what I am requesting.
 
Is there an option I need to select????  Should I revent back to 10.5?
 
 
Atom

Actually I think I am okay, my Bad

The package you developed is extremely useful and has saved me a significant amount of time. Awesome work CameronS! Thanks.
Atom

Can anyone provide an updated version of the SheetFinder macro that @DultonM provided a screenshot of?

Bolide

@kenpei1, here you go! You'll need to be logged into the Alteryx Public Gallery. Download it onto your computer to see/use.

Quasar
Quasar

@CameronS - Any plans to refresh the macro for this year, adding in the great work by @DultonM and properly posting it to the Public Gallery so it's more discoverable?

Alteryx Alumni (Retired)

Hi Everyone - I'm happy to hear this example has found some use!  

 

I've seen several asks for an update.  There are some things I'm looking at and hope to put out an update early next year - but please keep in mind that this is really meant to be an example of how to think through designing this kind of solution for yourself.  I work for Alteryx but not in a product development capacity so any examples that I post are for example purposes, not official Alteryx tools.  I do some testing before publishing, but nothing like the extensive testing our QA team does when a tool is built in to an Alteryx release.  

 

To those of you struggling with xlsm files, the xlsm file type is now supported by the input data tool so you can use the same method I used for reading the tabs in xlsx files.   

Alteryx Alumni (Retired)

@DultonM - Kenpei and I are unable to access your macro through the gallery.  If it is in a collection, you'd need to add us to your collection.  Best place is in the public gallery so anyone can access it.  Could you just put the export package in this community post for now?

Bolide

Sorry everyone! I forgot to change the settings to share in the Public Gallery. Anyone should now be able to find it via the link in my post above or by typing "SheetFinder" in the search box in the Public Gallery.

Meteor

Hi Cameron,

 

First, please allow me to thank you for creating such wonderful macro.

 

I have a problem: I have many XLS files in which each file contains a sheet with a name that contains space and period, e.g. "Invoice_Log_Report V1.0". I can't rename the sheet name. I turn off the option "Specify Sheet" and set the Filter File Name option to "*_Invoice.xls". When I run my workflow, I keep getting "Error: Excel File Opener (5): Tool #5: Record #2: Tool #1: Error opening table: Microsoft JET Database Engine: The Microsoft Jet database engine could not find the object '0$'. Make sure the object exists and that you spell its name and the path name correctly.\3011 = -543884569"

 

If I remove the period, the macro Excel File Opener works.

 

Do you have any advice or workaround? 

 

Thanks.

 

Regards,

=Hamdy

Alteryx Alumni (Retired)

Hi Everyone,

 

I appreciate the continued interest in this macro.  I've taken into account some of the feedback as well as my own use of the tool and posted an updated version of the macro that will now be hosted in the gallery here

 

Changes:

 

1. Option to use the xls vb reading macro for all Excel file types (fix for some users reporting issues with reading xlsm files).  I believe this will incorporate @DultonM's change as an option into the macro.

2. Option to return the list of fields for each sheet without reading all of the data (I find this useful to gain a quick understanding of how well the data will match up)

3. Changed the directory tool to read all file types as some users reported issues with certain types of file systems.  A filter is applied to limit to only excel files.

 

 

@hamdylie I think you may have an issue with your wildcard.  I'm able to read in sheets from xls files that have the same naming structure.  Instead of "*_Invoice.xls" you may need something like "Invoice_*.xls".

Meteor

Hi Cameron,

 

Thanks for the reply.

 

I am sorry I wasn't clear on my problem. I have a folder that contains many invoices plus some other files. The files I am interested in have a file name convention "YYYYMMDD-YYMMDD_Invoice.xls" (therefore I filter the files using "*_Invoice.xls". Inside each invoice, there is a tab with name "Invoice_Log_Report V1.0". I believe it is a period inside the tab name that causes the problem. 

 

2017-04-04 17_26_15-20170101_20170131_Invoice.xls  [Compatibility Mode] - Excel.png

 

I hope this explains the problem better.

 

Thanks.

Meteoroid

Just saw this post and it is really fantastic. Thank you for this great tool @Cameron

 

I have multiple excels of pricing ladder for different products. Each excel follows the same structure (same number of rows, different number of columns). So after importing all the files, my database looks like something shown on the left below. I am wondering if there is way to help me sort and transpose the data so that I can do some analysis for a specific product? Ideally, I am hoping to get the either of the two formats shown on the right below.

 

I was thinking to modify the macro to transpose each individual excel before consolidating into one database, but as I am pretty much a rookie, I really appreciate if someone can give me some directions.

 

Thank you.

 

Capture.JPGCapture2.JPG

 

Asteroid

SWEET! Thank you.

Alteryx Partner

Hi Cameron,

 

I really like this macro.  I have just reset my computer and reinstalled all the apps again, including Alteryx.  Now I am trying to reload this app into my Macro directory.  However when I go to your download link, this link to your macro comes up as a Workflow and I cannot figure out how to take the macro out of the workflow and install it into my Macro directory. 

 

Can you provide steps on how to take the Read All Excel Macro out of the workflow and save it into a file directory.

 

Thanks

 

Alteryx Partner

Hi Cameron,

 

It was a late night while trying to install macro.  Managed to work out how to save Macro from the workflow this morning in a couple of minutes.  Love this Macro, our whole team uses it all the time.

 

Thanks

 

Alteryx Certified Partner

Hi all, 

I'm getting an error in this tool when trying to read in an .xlsb file. 

Seems to work fine (ignoring the warnings) for .xlsx-Files. Any idea why that would occur?

xlsb error.PNG

Cheers, 

Benjamin

Meteor

Any chance that a version of this workflow package is available for 10.6? I am unable to import the existing workflow package.

 

Thanks!

Asteroid

Hi Cameron,

Its a great tool. Your SheetFinder macro is really amazing.

I used your macro and I have an observation

--> SheetFinder macro can find all the sheets of xlsb files but alteryx can't  read that xlsb sheets.

Is there any suggestion for that?

Meteor

Is there an easy way to update this so that you can specify what row your data starts on? As is I believe the macro is set up to start on row 1, whereas my data starts on row 4.

 

I drilled down to the "SheetReader" macro and tried to update the Input Data tool found there, but can't figure out the right tweaks that need to be made, if that's even the correct spot. 

 

Thanks!

Alteryx Alumni (Retired)
Yes, the sheet reader macro is where you would make that change. I don't think that feature is available for reading .xls files so you will only be able to add it for .xlsx files.
Asteroid

Hi,

@CameronS

is there any possibility of sorting the issue below with this macro? The main issue is that the DataInput doesn't output all the data in the column.

https://community.alteryx.com/t5/Data-Preparation-Blending/Alteryx-Input-Data-quot-eating-quot-XLS-c...

I've used your macro and i got the same result, some data in the column is not read from the .xls. There's a sample XLSDataInput.xls.

Haven't yet figured a way to convert this file/tab to another format automatically, without reading the .xls first.

Regards, 

Alteryx Alumni (Retired)

Hi @jcardoso,

 

This example won't do anything different as it uses the JET engine driver from microsoft with an input tool the same as you would otherwise read an xls file. However, if you have a solution that works, such as converting the xls file to csv, you could certainly expand this macro (or build another for that specific purpose) to do that conversion in a similar manner to how I run VBA code to get the list of sheets in an xls document here. 

Meteoroid

I almost pulled the following after seemed obsolete, but maybe we have a new worthy use case?

 

Howbout this converter I whipped up last year to convert xls to xlsx or csv.

 

https://community.alteryx.com/t5/Data-Sources/Error-opening-an-XLS-not-xlsx-file/td-p/47085/highligh...

 

Page thru till ya see my secondish entry which gives the 2 output format options. Maybe that'll help. Haven't retested so dunno if needs maintenance.

 

Grab it soon as I may still take it down. Alpha version. Use with caution, and joy 🙂

 

If it works, UNUU!!!

(Just rolls off the tongue.)

 

Dan Jones

CTO

Unuu

Your New Favorites Toolbar, Free at Inspire2018

 

 

 

 

Meteoroid

I just tried the Macro Read in All Excel and it crashed Alteryx.  I have version 11.8.4.4...

attached is pic of crash.alteryx crash using macro read in all excel.jpg

Asteroid
I just used this macro downloaded from Gallery. Yet again this macro has been 'The Gift' which saved the day. Thanks, Cameron, again and again. Please keep this up to date out on Gallery. Brenda
Meteor

Hello All!

 

I was wondering if any of you had run into this type of error:

 

Excel File Opener (1) Tool #5: Record #15: Tool #1: Error opening connect string: Microsoft Access Database Engine: External table is not in the expected format.\3274 = -328602519

 

I am connecting to a sharepoint drive folder which contains a whole bunch of different excel files that I am trying to pull in...this macro was working for me until a couple weeks ago. Also this error shows up once all of the files have been read into the macro...i wonder if it has something to do with being at the end of the list?

 

Any help would be great!

 

Regards,

 

Dom

Atom

Great Macro and definitely helpful. 

I am however facing some issues trying to read .xlsb files as I am getting the error:

"Error: Excel File Opener (5): Tool #3: Record #2: Tool #26: The external program "..\SupportingFiles\unzip.exe" returned an error code: 9"
and can not figure my way around this. 

I have tried to use the macro provided by @DultonM too but I am getting errors as well. 

Anybody got any fixes for that error? 

Thanks, 

 

** Solved **

 

Issue is from Excel not the Macro, I was getting an error from Excel (pop-up window) everytime I ran the Macro. 
Think it is a driver thing, not to sure though. 

Meteor

Hello again!

 

I am having a problem with this macro where the output is not showing the column names that are in the excel file but "ColumnA" to "ColumnZ" with some F6 column names thrown in. When I run the app it populates the next tool i have in my flow with the correct column names but then it if i add a new tool to the canvas then it goes back to the ColumnA...this was not happening for quite some time and i have tried uninstalling the macro and reinstalling it but nothing has changed...Help!

 

Regards,

 

Dom

 

Capture.JPG

Labels