I have multiple excel files in a folder. I want to import through directory and combine all files after making columns consistent.
Issue:
-Each file has multiple sheet names - some of which are required to be combine and some of them are irrelevant.
-The sheet names of each file are not the same. So I have to import all files and all sheets and then manually select which are the sheets relevant for me.
-All files and sheet names do no have same column structure i.e. column names and number of columns are not same. So i cannot combine based on column names or position. Neither can I have a template file in dynamic input because of inconsistent structure.
- Also some of the sheets have logo on 1st line or 2nd line which needs to be skipped
- Since column names are not same (eg. client vs customer) - I would like to rename the inconsistent ones manually (but there is no logic or standard formula that would work to replace. I would have to see the data and then make column names consistent)
Think of a sales database 2021,2022, 2023 and each file has monthly sheet data eg. Jan-21, Jan-22, and so on... and the
Can someone please solve this problem with a dummy dataset (with directory rather than manually importing individual files). I cannot upload the file due to upload controls. Thanks in advance
I tried but it doesnt work. Each of my files have multiple sheets (only 1 is relevant) and sheet names are inconsistent. Also, the column structure is inconsistent (name and number).
The flowchart posted by @nagakavyasri is going to direct you along the best path. That is a great resource.
Your options are basically:
Option 2 is going to be much easier and you will be dealing with a batch macro. You mention that they are excel files. Hopefully they are xlsx (if they are xls, then it will take more work and I would definitely separate the xls vs xlsx...).
Break this up into sections.
Next up, you want to build a workflow that deals with one of the files. See how it works for another file and whether you need to write different rules depending on what the data looks like. Key is that you want to deal with the inconsistencies, as once the data is in a referenceable table, you can deal with it all together.
Turn that workflow into a batch macro. The batch macro will have a control parameter that updates the input tool by replacing the whole filename reference. Remember to bring the filename in on that input as well. Check out this 10 min interactive lesson on Batch Macros: https://community.alteryx.com/t5/Interactive-Lessons/Creating-a-Batch-Macro/ta-p/657923 and the attached one with the below info.
From here you may have to work some stuff out..
You now have a process... and you can filter the files that go into the batch macro to test and slowly add more rules inside the macro. I advise looking to sort out the inconsistencies first, fieldnames, dates etc. Completing columns etc can be done in bulk after the macro.
Batch.ListOfSheetNames.yxmc
This is a great simple batch macro that you can look at to see how it works.