Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Consolidation of Excel Files and Summary

Kaish
8 - Asteroid

Hi all, I have a very challenging project for my organization and this task is for those who love challenges, therefore I need help from my community members.

I want to consolidate many excel files into one, there are multiple excel files with different names I receive in a directory between15-20 files. Each file consists of several tabs, some have 1 some have 2 some have 3, but the catch is there is one tab which has common data in all the files, like the example the data consists of name, ID, Order, Date. The tabs also have different names, they are not similar, each file has different names of those tabs, Sometimes there are few files which don't have any data in it (Please Note this point) and Sometimes the data columns are not adjusted like other file data, the order sometimes change like Name, Order, ID, Date. Therefore I need to consolidate the data of these files into one in the output with the data aligned properly and One separate summary file for the points mentioned below. There are few important outcomes I want from Alteryx in this case.

 

1) Consolidation of files as per the order which is present in all the files and if some file have different order I want Alteryx to send a summary of how many files were not in order (this  data to be sent in Summary file)

2) Summary of how many files have the data and how many files don't have the data (Names of the files).

3) Total Count of Names in the summary file.

4) The file should consist of the Date in the output (Consolidated File and Summary file), the date which is provided in the data.

 

Therefore there should be two files one for consolidation and one for Summary as mentioned above. The Summary file data should be in Table form for visibility. There has to different tables for different requirements (Individual Tables). Kindly help me in this, I know this is challenging but I want to get help from my fellow community members. Below I am attaching sample data which resembles my data. Hope this attached helps, ping me for additional requirements.

2 REPLIES 2
the_data_guy
5 - Atom

Hi. I am working a similar exercise where I was required to consolidate multiple files (tables) into one consolidated file (table).  

To put this into a Alteryx workflow is straight forward. 

  • The basic workflow is follows:  Input node, Select Node, Cleansing Node, Filter node (if applicable for each input stream), and formula node (to create identifiers).  
  • Then Union all your input data streams (you might need to manually configure to ensure all matching columns are aligned. 
  • Then utilize Summarize node to group your data by the attributes to create the summaries you require.   You can probably utilized several Summarize nodes for the specific summaries you need.  
  • You can then utilize the Output Data node to create your final output file(s).  

Hope this helps.  

 

See image snapshot of my workflow.  

Alteryx consolidated workflowAlteryx consolidated workflow

KGT
9 - Comet

There's a couple of items in this that direct me to batch macros.

 

The files could change name. That means that you need to get the name each time. A folder browse is the easiest way to do this.

The tabs are inconsistent. This means that you need to get the tab names. This can be done either with a dynamic Input or a batch macro.

Columns can be out of order. This will be handled automatically to re-align them in most solutions. If you NEED to know the order they were in, then an extra process will be needed.

Multiple outputs for Summary and Data. This would be my recommendation anyhow.

 

So, most of the above can be achieved with the CReW XLSX macro available here. However, you lose customisability. If you don't want to build it yourself, then that is your best option.

 

Basically, the way that I would set this up to build it myself is:

  • Folder Browse, followed by any filtering etc to only select new files or xlsx etc.
  • Batch Macro or dynamic Input to get a list of all sheet names.
    • I would go with Batch macro as it allows more customisability such as longer field names etc.
    • Batch macro would be: Control Parameter (linked to input); Input (Import only a list of sheet names, and include fullpath as another field); Select (to make that field a size 50); Macro Output. Option on the batch macro interface designer will be "Auto Configure by Name".
  • Formula to construct the full path with sheet name
  • Possible filter to select only the sheets you want. This depends whether you can write the filter based off the sheet names. Not possible if you don't know yet which ones you want.
  • Dynamic Input or Batch Macro to read in all sheets
    • Again, I'll go with Batch Macro so that I have full control.
    • Batch Macro would read in the sheet, and then you can put any reporting etc on and output both summary and data. You can slowly add more to the batch macro to do more reporting on the fly as well.
  • Whatever else you need to do with it.

 

After getting used to Batch Macros, they become a bit of a preference due to their versatility. The CReW XLSX Macro contains a couple of batch macros that do what I've said above for you. It sounds like you have little confidence in the metadata of the files coming through and so with a Batch Macro, you can even set the input to not assume headers and manually sort that out if you want or include a "Field List" to get the order of columns and add to your summary... That second batch macro is your key to designing the solution in the way that fits your situation.

Labels