Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Repeated Analysis Across a Large Number of Excel Files

maxwellqcollins
5 - Atom

Hello!

 

I am looking to perform an analysis across a large number of excel spreadsheets. I have about 250 uniquely named individual excel files, and most of these excel files have 3 tabs that I need to pull information from. These three tabs are identically named and structured across all workbooks. The analysis is different for the each of the three tabs, but will be the same across all excel files.

 

I have been able to easily come up with the Alteryx workflows I need each of the three tabs to to be put through for each excel file, but I'm not sure how to go about analyzing the workbooks collectively without creating a new input data tool 750 times, which would not be practical.

 

My end goal is to create a table with the file name as a primary key in a join in the first column and the information from those three tabs in the row, for all 250 files.

 

To phrase this differently, I have three analyses on my Alteryx workflow. The first is to analyse tab 1, second for tab 2, and third for tab 3. I need each of my 250 excel files analyzed for tab 1 on analysis 1, all 250 files analyzed for tab 2 on analysis 2, and all 250 files analyzed for tab 3 on analysis 3.

 

Is there a user-friendly way to do this in Alteryx? I've passed the core certification but I am a relatively new user.

 

Thanks for the help! Let me know if I need to clarify.

2 REPLIES 2
NicholasM
Alteryx Alumni (Retired)

Hey @maxwellqcollins 

 

I believe you are looking to achieve something similar than what was discussed in a previous post. A macro was created to do essentially what you're looking for. Link is here: https://community.alteryx.com/t5/Engine-Works-Blog/The-Ultimate-Alteryx-Holiday-gift-of-2015-Read-Al...

 

This macro is capable of the following: 

  1. Capable of reading all sheets/tabs from a workbook, regardless of schema
  2. Capable of reading all workbooks from a folder

The macro can be downloaded on our public Gallery located here: https://gallery.alteryx.com/#!app/Read_All_Excel_Files/58dd51b3a18e9e18fca64172

 

From there you can then do your analysis on the individual sheets in the workbooks. I hope this answers your question!

 

Best,

Nick Michels

 

 

 

danilang
19 - Altair
19 - Altair

Hi @maxwellqcollins 

 

Since you already have the 3 workflows written, what you're looking for is a way to run these workflows for all the files in your set.  There are 2 generic ways to do that.

 

1.  Import all data at once and add grouping to your current workflows

 

To accomplish this, change the input in your 3 current workflows, so they accept wild card inputs and ensure that the file name is included in the input

 

Input.png  

 

Once you have that, you'll need to modify your workflows so that they apply the filename group properly.  You'll need to add the group to all tools that support grouping and also add the filename as new join criteria where necessary.  If you're using multi row formulas, you may need to modify the logic to handle the extra level of grouping

 

This can be a tricky process, especially if you perform cross tabs and transposes since you now have new columns to consider.

 

 

2. Convert your workflows to batch macros and control them from a main process.

 

 

Here's the basics of a strategy that does that. 

 

Workflow.png

 

The Input just contains a path to the directory where your files are located.  The formulas at the start of each of the branches adds  "*.xlsx|||" and the sheet name you're interested in.  The Dynamic Input reads the data from that sheet in all of the workbooks.  The Unique and Select that follow build a list of the unique file names that the dynamic input has read.  This is passed the Control input of the batch macro and the data is passed to the data Input.  The batch macro runs one iteration for the each of the file names passed in to the control.  The results are grouped by filename and come out the output of the macro. 

 

Each of the branches adds a different sheet name and runs a different macro.  

 

Here's the outline of one of these

 

 

Macro.png

The data that enters the macro is combined list of all the data grouped by file name.  The 1st filter tool just selects the records for book one.  The formula tool is a stand-in for your workflow,  and the results come out the output.  Because this is a batch macro, this basic process is repeated for every file name that's passed into control parameter.  In my simple example it runs once for book1, again for book2 and finally once more for book3.  The results of each of the iterations are unioned together and sent out the output as one dataset.  

 

The Update Value action tool is what modifies the filename filter to change it on every iteration

 

Action.png

 

For every iteration, it replaces the operand value "Book1" with the current parameter value.  First Book1, then Book2 and then book3.  The attached package demonstrates this second approach

 

The tricky part for you is going to be modifying your existing workflows to change them into batch macros.  You basically need to replace the input that you have in your current workflow with a macro input and route the output to an output tool with the filename included

 

 

 

In either case, it's not a simple process.  Each of the methods has benefits and drawbacks and the one you decide to use will depend on the complexity of your current workflows.

 

Dan

Labels