community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Knowledge Base

Definitive answers from Designer experts.

Read in Multiple Excel Files, with Multiple Tabs that have Different Schemas

Alteryx
Alteryx
Created on

One of the best things about Alteryx is the ability to read in multiple files very easily and automatically combine them into a single dataset. This becomes a bit trickier when dealing with files that have different schemas or Excel files with multiple tabs. Adding both multiple excel files with multiple tabs, and having the schema change within each tab takes it to another level.

 

If your tabs have the same Schema, the article you want to read is here.

 

The way to accomplish the task if the tabs have (or may have) different schemas (field names change depending on sheet) is to use nested batch macros. I've attached a sample workflow built in 11.0 that demonstrates the process.

 main workflow.png

 

In the main workflow pictured above, the Directory Tool input pulls in the file paths of all of the XLSX files in the directory you're pointing to (note you may need to redirect this tool in the sample to a directory on your machine).

 Macro1.png

 

Most of the magic happens in the macro pictured above. This macro takes the FullPath field and updates the main Input Data Tool to read the first file in the list from the Directory Input. It is configured to read the list of sheet names within that first file, but also to output the Full Path from the Input Data Tool. A new field is formatted in the formula for an acceptable full file path for an Excel file, including the desired sheet name. This final file path is passed into the 2nd macro as the control parameter.

 

Macro2.png

The second macro is very simple. It takes the file path received from the first macro, updates the Input Data Tool, reads in that file and then passes it back to the first macro. It repeats this process once for each sheet in each of the files being passed from the Directory Input Tool.

 

Each batch macro holds the data until each batch is completed and combines it all into one large data set.

 

Note: the sample was created in 11.0 - it will not open in earlier versions.

Attachments
Comments
Meteoroid

This is great, thank you Andrew. Is there a way to select the specific sheets you want with this workflow?

Meteor

This is a great macro 

 

Meteoroid

Hi AndrewL,

 

Thank you for this macro. It's awesome! Do you have a solution for mweiser's question?  Also, is it possible to add the full file directory to the final report? This would help tremendously because I deal with a lot of files that are only differentiated by the file name and the numerous tabs.  Thanks so much.

Meteor

I can get this macro to work if I populate my file path and one file from the directory in the input tools in test and test2 macros.  How would you build the macros so that the input name is driven by the workflow input fullpath instead of having to update the macros each time? 

Asteroid

@AndrewL finally find something I could use efficiently.  Thank you Andrew.  This is very valuable.

Thanks For this Macro!

Is it any way, to keep, the File name Only in additionel field. I mean all combine+field with thne file name.

The goal is that i have 300 Bills to combine on one file, each bill has a number on filename.

Thanks.

All excel files have the same structure.

I find the solution on the new menu on altryx ( CreWMacros)

ExcelfileCombine.PNG

Meteoroid

Hello Andrew,

I've found a very similar workflow here on the community and was using it to consolidate 830 excel files with very different data schemas. The problem is that the data starts in different lines for each file, I need to customize the macro to find a Keyword to determine the header row and consolidate the data from that row down. How can I manage that?

Asteroid

'm trying to open multiple excel files with multiple sheets of possibly different schema but I only wanted one sheet (that sheet name is always the same) pulled from each of those files.  So I added in a filter to the above workflow to say [Sheet Names] = "Name."  In the directory tool it takes the ten different files I have and creates a list.  This list is handed off to the first macro.  In that first macro it takes and runs the first file ten times and skips the rest.  Why would it do that?  The action is to update the value, and the Replace a specific string is checked.  Not sure what is going wrong? 

 

Brad

Meteor

Hello brad_j_crep

 

If you can post a screenshot (filtering out confidential info) of the warning messages that'll be great. I can think of two possible scenarios:

a) The macro is looking for XLSX files, but may be some of those are actually XLS or have a different sheet name than the one given.

b) If you see that all the files are actually being sent to the macro, the macro might be skipping those with a different schema (you'll see it as a warning or error), or because the file path changed (XSLX vs XLS).

 

Cheers!

Thank you for your solution! It is very convenient and more than helpful.

Asteroid

@evilla  Got it to work.  It had to do with a filter I added to the original.  They were all .xlsx files.

Hi Andrew - is it possible to also add in a custom range as well? I have about 18 tables in one excel sheet in multiple files/tabs that all need to be combined. 

 

 

Meteor

Hi holmesmeyer00 yes you can. First, in the Exel spreadsheet create a table and give it a name (select the data -> right click -> "Define Name" ). Then go to Alteryx and call those tables one by one or by creating a Dynamic input.

 

Hope this helps.

 

Good luck!

 

@evilla

 

Named_Range.pngNamed Range

 

Atom

I am trying to use this workflow but am having a hard time getting it to work. The input files are not available in my download so I can't see the formats, etc in them. Can you please share?

Asteroid

I ran into an error in which the macro would work fine when run manually, but not as part of the workflow. It returned an error that the files could not be found, and the file names in the error did not have .xslx included. 

 

What ultimately worked for me was to go in the test.yxmc macro and update the select tool so the string size was smaller. I don't know why it worked, but it did. 

 

Is there any reason the string size was set to 1073741823?

Meteoroid

Hello! I'm trying to pull in any tab with "CC " in the name from multiple files. I added a filter to the macro to achieve this and it may be my problem, but i'm not sure. My macro runs fine, pulling both tabs as expected. When I put the macro after the directory you can see it reading both files, however it is pulling the two tabs from the one twice, and not pulling the single tab from the file at all. 2018-04-27 10_37_57-Alteryx Designer x64 - New Workflow2_.pngMacro Screenshot.png

 

 

Asteroid

@alondergan

What value is selected in the update value actions?

Meteoroid

@Probley

The specific file is selected (the one with 2 tabs). Which would make sense as the problem. What option would be better?

 

Value Action.png

Asteroid

@alondergan

That makes sense. I tried recreating your solution with the Dynamic Input and could not get it to work. However, was able to get it to work by adding a Filter to the workflow recommended in the original post (the one in the second image - I put the Filter between the Input and Formula tools).

Andrew your Solution is Amazing, Simple and Elegant, as it made the following request possible!

 

The following are the actual requirements sent to me from my Boss, in a text message. and I said yes, because Alteryx has giving me the confidence that anything is possible and then I found your solution and completed this assignment in one 24-hour period.

 

If we have a folder that contains multiple folders which each hold specific reports, can Alteryx read the Excel files (the most recent date in each folder) and strip out the Excel file name and the header row from the report?

 

  • In the "test" Macro you can add any logic to limit the Excel Sheets.
    In the "test2" Macro you can add any other information you want.

I made one change to the "test2" Macro to only bring back the 1st row of data from the Excel file since that row contained the headers. In the Input Data tool #4 in its Configuration under Record Limit.

 

I ran it overnight and it processed 49,698 files in 7:44 hours and found 99 files that it couldn’t read, thus altering us that some of the old copies of our reports might be corrupt.

@AndrewL

 

Thanks for posting the macro. I'm just starting out with Alteryx so I tried to reengineer your solution so I could better understand how a batch macro works, but I'm running into an issue.

 

The folder I'm pointing the directory input tool to has 8 files of different schemas. The input tool in the macro workflow is pointing to one of the folders in that file, which contains counties by state with each state being a separate tab for a total of 51 tabs. When I run the workflow, I'm expecting the output to be approx 60-65 records representing all the tabs within all the Excel files in the folder. The actual result is 408 records, which is the 51 state tabs multiplied by the number files in the data directory. 

 

Have you seen this issue and can you point me in the direction I'm going wrong?

 

Cheers!

For my project I only wanted the 1st row in an Excel sheet to get the Header fields and not the data.  So, I made a change to the "Record Limit" in the "Test2.yxmc" for the "Input Data" tool configuration of entering "1" so only the 1st Row of the Excel sheet would be returned.

 

Record Limit 1.png

 

If you wanted to exclude the files that contains the 51 tabs of States, you can add a filter on the Filename, or the Excel tab name in the "Test.yxmc" macro to exclude that Excel filename or Excel Tab name to not continue. Or you can add a series of "Filter" Tools to separate Excel files with different sets of data by creating a copy of "Test2.yxmc" called "Test3.yxmc", "Test4.yxmc" etc. macro for each of those different sets of data.

 

Or in the "Workflow.yxmd" workflow you can tighten up the "File Specification" on the configuration window for the "Directory" Tool to a wildcard like: *GoodData*.xlsx

 

File Specification.png

 

I hope that the above answers your questions!

 

 

Apologies as I don't think I was as clear as I could have been in my original post. I'm focusing on getting a complete list of tabs in each file, which would then be fed into the second macro to pull in the underlying data.

 

Assuming that there are 8 sheets with differing numbers of tabs in each file, the macro doesn't identify the complete list of tabs/sheets. Rather, it gives the tabs of one of the sheets 8 times. 

Quasar

@AndrewL, many thanks for this post! It was exactly what I needed to get the job done!

 

 

 

@AndrewL

 

Thanks for posting the amazing macro.  My requirement is that I have to upload all the tabs from each sheet to the sql-server. The solution unions all the tabs in each excel but in my case all the tabs do not need to be union rather the table should be uploaded as 'filename_sheetname'

 

Can anyone help me with ideas of how I can achieve this? 

 

 

 

https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Read-in-Multiple-Excel-Files-with-Multiple-T...

Untitled.png 

Cheers!

@AndrewL 

 

Hi Andrew,

 

Does the input tool read sheets over a million with a continuation on a 2nd sheet?

 

 

I tried this but its only reading the 2nd sheet which is the continuation of the first sheet.

 

Thanks,

Rio

Asteroid

@AndrewL thanks for the awesome macro! Saved my bacon today. 🙂 

 

Bitmoji Image

Asteroid

Hi Andrew, 

this workflow looks v helpful-- i am working on similar problem and using the same workflow but somehow i am not getting the coorect result 😞

i should be making some silly mistake--

i have 4 excels with different schema out of which only 1 excel has 6 tabs .. rest all has 1 tab .

 

Result i am getting is only the consolidated excel with 6 tabs and with much more data -- all these tab has only 196 rows but in output i i am getting 784 rows -- please help what mistake i am making 

 

i only made 3 changes --

1 -- set the working directory to the folder where i have all the 4 excels (4 with 1 tab, 1 with 6 tabs)

2 -- test  -- browse to the excel with 6 tabs and  input list of sheet names

3 -- test 2 --  browse to the excel with 6 tabs and selected first sheet.

 

Please help 

 

 I have 12 months of data each with one tab. When I aggregate them, they all move to one sheet but their are hundreds of rows between each month for some reason. For example, January data is rows 1-1000. Then Feb is 2500-3000 and 1000-2500 is blank. Any idea why this may be happening. Otherwise, this works great!

Alteryx Partner

It probably has to do with the way Alteryx is reading your excel tabs.  It thinks there is data in the tabs that are not populated.  You can just filter those out. 

Thanks @bb213. One more question - I am trying to build this workflow from scratch and am having trouble with the formula tool. I cannot get my formula to recognize the trimright([FileName]. "Filename" does not show up in the dropdown menu as an option to select from I only see [enginetempfilepath]. Do you know what could be causing this? 

 

 

Alteryx Partner

@lmalone011 Sorry - I meant that Alteryx thinks there are rows (not tabs) that are populated when they aren't.  For your last question, you just need to edit the macro, updating Option 5 to "Full Path" and then re-saving.  That should work for you. 

 

Capture.PNG

Thanks! Both macros run now, but when i run the final workflow it is giving me an error message "The control parameter must be mapped to a field." I am not sure what the mistake could be, my workflow and macros look identical to the example which runs perfectly. 

 

Alteryx Partner

@lmalone011  you need to choose "FileName" as the Control Parameter (click on the macro). 

Also, you can use Regex to strip the tab name off the file name. See picture. 

 

Capture.PNG

Meteoroid

@bb213 Where should the RegEx tool be placed to make sure the sheet names are captured in the final output? I've tried various places in each of the macros and the final workflow and it only returns the file name (not the sheet name). Thanks for your help!

Alteryx Partner

@Elise_Prime You would not place it in the macro, you would place it outside of the macro - it could be the first tool after the macro output.  The macro output will output the entire filepath as the file name - using this regex will get you just the sheet name, i.e, everything after the triple pipe. 

Meteoroid

@bb213 When I run the test2 macro the entire filepath (including sheet name) is displayed in the FileName field. When I run the test macro, only the file name (no folder address or sheet name) is displayed in the FileName field and the same thing happens if I run workflow. I'm sure I'm missing something simple, just not sure where... 

 

Thanks for your quick response! Any other ideas you might have here would be greatly appreciated!

Alteryx Partner

Can you post your workflow?

Meteoroid

@bb213 It makes sense, but just isn't working... I guess what I was trying to say earlier is if there is no full path string captured in the test macro, how could it be included in the output of workflow? When I put the RegEx tool after the test macro in workflow it just returns the file name in the field called RegExOut1.

Meteoroid

I'm not sure how to post my workflow; I will try to figure that out tomorrow! Thanks again for trying to help me!

Alteryx Partner

You need to update Option # 5 (see above) in BOTH the macros.  Right-click on test2.macro -> open macro: test2.yxmc -> select input tool and update Box 5 to "Full Path". 

Meteoroid

Hi again, I have both of the macros set to "Full Path" in option 5 of the Input Data tools. And I'm not sure how to post my workflow... Sorry, new to the community posting.

Alteryx Partner

The name of the field will be "FileName".  It will contain the full path.  Please see picture below. post to alteryx.PNG

Meteoroid

I figured out what was not working in my macros... Turns out I didn't know that in the action tools in the macros are very particular with what is clicked on in the "Value or Attribute to Update:" section (which in hind sight seems rather obvious...) Sorry for all the time you took to help me figure this out, but thank you again for all your efforts!

Hi, I am running the Macro but have run into an issue of the Input Data Tool used within the "Test" Macro being the only file operated on. My workflow is creating the same output on that one File times the number of files in my directory. How would I go about making the macro act on the files in the directory?

Atom

Hello, I am trying to run this macro to input data from multiple excel files with with multiple Tabs (some have a single tab and some multiple) but it is not working. I have replaced the file path in the in macros to the one I want to use and it is only outputting data from all tabs from one excel file only.

thanks a lot for your sharing .
Atom

@AndrewL

 

Thanks for this, it's helpful! I am just getting started with Alteryx, so I was wondering if you could share what is in the "Q4 2016 Change Log.xlsx". Especially what sheet "known issues" is for or what it should say.

 

Thanks so much!! 🙂

Atom
Thanks Andrew! very helpful!! I am trying to add file name to my final output and not working. I changed the settings in input for test2 macro " Output File Name as Field" to "File Name". the Filename column is showing now in "test2" and "test" macro but not showing in my final Workflow which uses test macro. I saved both macros and ran the final workflow. the "filename" column is not showing. can you please help?