Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

How to Input Multiple Sheets from Multiple Excel Files

TonyM
Alteryx Alumni (Retired)
Created

In this article we’ll take a look at how to leverage a batch macro to read in multiple sheets from multiple Excel files using Designer version 10.0. The two Excel files I’ll be using in the example are called Fruits and Vegetables with sheets named Apples, Oranges, Broccoli and Spinach, respectively.

 

The first step will be to read in the list of sheet names from one of the Excel files using an Input tool. This feature is new to 10.0 and is a great addition when used in conjunction with the Dynamic Input tool.

1.png

 

We can then connect our Input tool from the previous step to a Dynamic Input tool. In our configuration, use the same Excel file as our Input Data Source Template. We can use the Sheet Names field from our Input tool as our List of Data Sources and our action is going to be to Change File/Table Name.

2.png

 

Now we’re ready to add our Interface tools to the canvas. Bring in a Control Parameter tool, 2 Action tools and a Macro Output. Connect the Control Parameter to the 2 Action tools, and then connect on to the Input and Dynamic Input. The Macro output will be connected to the output of the Dynamic Input.

3.png

 

In our Action tool configuration our action type will be Update Value. We want to update the File – value that is being passed on to the Input and Dynamic Input. We will, however, only need to update the portion of the string without the sheet name, which is why we’ll also select Replace a specific string: This will allow us to batch our process with multiple files.

4.png

 

Lastly, if our Excel files have different schemas, in the interface designer we can set the macro to Auto Configure by Name or Position so that our workflow does not error out. Note: sheets within the same file will have to be the same schema.

5.png

We’re now ready to save the macro and put it to use!

 

Now, on a new canvas, we can place a Directory tool (If all your Excel files are in one directory), or, we can use a Text Input to manually enter each individual file path location. After the Directory or Text Input tool, insert your macro (right click on the canvas->insert->macro), then choose the full file path for your question in the macro configuration. Add a browse, hit run and look at the results!  

6.png

 

We now are able to read in multiple Excel sheets from multiple Excel files!

 

The attached workflow package includes a workflow, batch macro and excel files created in Alteryx v.10.0

 

Tony Moses

Client Service Representative

 

Attachments
Comments
danielkresina
9 - Comet
Thanks Tony, I had setup a macro like this a while back, however one challenge I face is that apparently all the metadata is lost, so if you try to configure tools down-stream, they don't see what fields are available till you run the workflow. Even then, it seems to intermittently lose the metadata and default back to the generic fields you used when setting up the macro. With a traditional input tool, if you drop a select statement after it, you see a list of available fields which matches those in your input file. With a macro input like this, that often doesn't work. Any ideas on how to prevent that? Thanks, Daniel Kresina
TonyM
Alteryx Alumni (Retired)

Hi Daniel, 

 

It's difficult to know what exactly may be causing the issue, but at first glance, I believe it may be that the Disable Auto Configure option in the user settings is checked.

 

user settings.png

 

It not, we can take this offline, and I'd be happy to troubleshoot your issue.

 

Best,

 

 

nikhilprakash
5 - Atom

Hi... this only works with the .xslx format as the List of Sheet Names table doesn't seem to be present in the .xls format... any idea how something similar can be done for the older excel format?

MikeB
Alteryx
Alteryx

@nikhilprakash take a look at this blog post that Ned wrote a few years ago: Reading Multiple XLS Tables

I believe his example does something like what you're asking and it should work with the XLS format.

nikhilprakash
5 - Atom

Thanks a ton for the awesome response speed!... let me take a look ... 

oliver_sayers
6 - Meteoroid

Hi Tony,

 

Is there a way of just pulling the data of only the first sheet within each of the Excel documents within the chosen Directory? I have two sheets in all of the Excels. I am only interested in retrieving the information from the first.

 

Is there also a way of mapping which Excel has contributed the data for each row of the output? Within an input tool, you can 'Output File Name as Field'. I selected this within the input of the Macro, however the column did not come through to the output.

 

 

Thanks,

 

Oliver

MikeB
Alteryx
Alteryx

Hi Oliver,

 

This thread started off with the XLS format (not xlsx), so I will answer first for that format and then for xlsx.

 

For the XLS format, we use the Microsoft Jet Driver within Alteryx. That driver has no option to pull sheets other than by name, so you have to supply the sheet name.

 

For XLSX, the same is true unless the file contains exactly one sheet. In that case we will read the sheet without having to name it.

We wanted to pull the first sheet for files with multiple sheets, but the order the sheets appear in the file is not defined (there is no way I can find to know which one is *first*).

 

The gory details.....

You can see this for yourself by unzipping the xlsx file (it is just a zip file) and looking at the XML.

In the extracted hierarchy, navigate into the /xl folder and take a look at the file called workbook.xml (you'll want an editor that can format XML so it's easier to read).

You'll see an element in there called <sheets> that contains child elements of type <sheet>, one for each sheet in the file.
The order of the sheets is not always the order in which they are displayed in the Excel application.

As you modify the sheets in the file, you'll see that the <sheet> items are re-ordered.

You can also re-order the sheets in the Application by dragging them around, which makes finding the *first* one even more challenging.

There is no way I have found in the XML to tell which sheet is the first sheet as it is shown in the Excel Application.
If there is a way to reliably tell which sheet is *first*, then we could revisit this issue.

 

 

danilodrobac
7 - Meteor

Hi Tony,

 

This article has been hugely helpful for a project that I'm working on, I just have one further issue that I'm coming across and can't find a way to fix. I'd like to append the name of the file to each record (as I have the date in the file name). I've added the option to "Output File Name as Field" within the Dynamic Input Tool but that doesn't seem to work. Could you offer a solution maybe?

 

Thanks

Dan

TonyM
Alteryx Alumni (Retired)

Hi Dan,

 

 

The Dynamic Input tool will allow you to do this in the Input Data Source Template 

 

Add FileName 1.png

Enjoy!

 

 

 

danilodrobac
7 - Meteor

Thanks for the reply Tony. Just an update as to why I was struggling before:

 

it was because I hadn't saved the Macro before running it again in the second workflow.

 

I hope that helps others from running in to the same issue.

 

Thanks again

Dan

ClaudiaV
7 - Meteor

Awesome!

aimeegoins
6 - Meteoroid

Macro Screenshot.png

Hello!

I added a filter to my macro to only pull sheets beginning with "CC 6". This is working fine in my regular workflow and seems to work fine in my macro. The problem occurs when I use my macro with a directory. It's showing that it sees both files correctly, but the browse at the end is pulling the two tabs from the first file twice and ignoring the tab from the second file. Is there something obvious I'm missing?

anamikakatiyar
5 - Atom

Hi Folks, 

 

This article was very useful for me in terms of bringing data in from multiple xlsx files together into one combined spreadsheet however, each time I execute my workflow it is appending the records to my last run and hence the output keeps growing. I am unable to reset the macro output. Is their something very basic that I am missing? Any pointers will greatly help.

 

Thanks in advance.

 

opheliawen
5 - Atom

Hi, I'm having an issue where the filename will repeat even though the data from the other files is pulling (so the filename is inaccurate). Any way I can fix this?