Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Combine multiple excels with inconsistent column structure and sheet names

rohanshroff1
5 - Atom

I have multiple excel files in a folder. I want to import through directory and combine all files after making columns consistent.

Issue:

-Each file has multiple sheet names - some of which are required to be combine and some of them are irrelevant. 

-The sheet names of each file are not the same. So I have to import all files and all sheets and then manually select which are the sheets relevant for me.

-All files and sheet names do no have same column structure i.e. column names and number of columns are not same. So i cannot combine based on column names or position. Neither can I have a template file in dynamic input because of inconsistent structure.

- Also some of the sheets have logo on 1st line or 2nd line which needs to be skipped

- Since column names are not same (eg. client vs customer) - I would like to rename the inconsistent ones manually (but there is no logic or standard formula that would work to replace. I would have to see the data and then make column names consistent)

 

Think of a sales database 2021,2022, 2023 and each file has monthly sheet data eg. Jan-21, Jan-22, and so on... and the

 

Can someone please solve this problem with a dummy dataset (with directory rather than manually importing individual files). I cannot upload the file due to upload controls. Thanks in advance

3 REPLIES 3
rohanshroff1
5 - Atom

I tried but it doesnt work. Each of my files have multiple sheets (only 1 is relevant) and sheet names are inconsistent. Also, the column structure is inconsistent (name and number). 

KGT
11 - Bolide

The flowchart posted by @nagakavyasri is going to direct you along the best path. That is a great resource.

 

Your options are basically:

  1. Determine the files to bring in together beforehand and then bring in groups of files that can be handled together. (determining by filename/extension is the most common reason to do this),
  2. Bring in all files and then handle the details.

Option 2 is going to be much easier and you will be dealing with a batch macro. You mention that they are excel files. Hopefully they are xlsx (if they are xls, then it will take more work and I would definitely separate the xls vs xlsx...).

 

Break this up into sections.

 

  • You need the filenames: Directory Tool
  • You need the sheetnames: Batch Macro to get <List of Sheet Names> for each excel file. (attached. See info below)
  • You need to bring in each file either in groups now or all at once. You will have a list of all files with sheet names to filter before passing to the batch macro that brings them in.

Next up, you want to build a workflow that deals with one of the files. See how it works for another file and whether you need to write different rules depending on what the data looks like. Key is that you want to deal with the inconsistencies, as once the data is in a referenceable table, you can deal with it all together.

 

Turn that workflow into a batch macro. The batch macro will have a control parameter that updates the input tool by replacing the whole filename reference. Remember to bring the filename in on that input as well. Check out this 10 min interactive lesson on Batch Macros: https://community.alteryx.com/t5/Interactive-Lessons/Creating-a-Batch-Macro/ta-p/657923 and the attached one with the below info.

 

From here you may have to work some stuff out..

You now have a process... and you can filter the files that go into the batch macro to test and slowly add more rules inside the macro. I advise looking to sort out the inconsistencies first, fieldnames, dates etc. Completing columns etc can be done in bulk after the macro.

 

Batch.ListOfSheetNames.yxmc

This is a great simple batch macro that you can look at to see how it works.

  • This will read your [FullPath] field from the directory tool (after selecting it as the control parameter).
  • It will get the list of sheetnames for each and start stacking them in a list. 
  • On the input tool, Output File Name as Field is set to FullPath (this is so that you can use it with the sheetnames).
  • On the interface Designer settings, the output fields are set to "Auto configure by Name". This is to alleviate issues with metadata.
  • The Select tool overrides the size of SheetNames to 200. If the first file has a length of 4 and then the next has a length of 6, that can cause issues.

 

 

Labels
Top Solution Authors