This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Hoping you can help me with the following as most of what I have read leads me to believe data cleaning has to be done manually before reading in files.
I have about 150 excel workbooks, each have a tab called "details". I am trying to get a workflow which allows me to read in all the information from the "details" tab and then allows me to select which columns and corresponding date should be output, based on the output template.
I already have a macro which takes the details tab from every workbook and combines them into one single workbook. The issue is the details tabs' schema differ across roughly 20 - 25 formats. I have the output template and the columns in the output are present in all the inputs but the positioning varies.
Hoping I can read in the singular workbook and only the columns in the singular workbook which headers match the template columns are then consolidated and output in the template format.
Does anyone have suggestions or a workflow already made for this process?
You can try something similar to this macro based solution, which will allow you to read only the columns you want from the 150 individual files, without the need for an Excel macro to combine them before hand.
The main program just gets list of files from the directory (Control parameter) and also the list of columns (C input) to read from those files.
The macro is a modification of a standard Read-a-Sheet-From-Multiple-Excel-files batch macro. The filenames are passed into the control parameter. The column names enter the C input and have a new field with the name "New_" prepended to it. The Dynamic Rename uses these 2 columns to rename the required columns from the current workbook. The Dynamic Select selects the FileName column from the Input tool and also all the columns that start with "New_". The second Dynamic Rename Changes the column names back to originals by removing the "New_". The batch macro is configured with a differing output schema in case some of the columns only appear in some of the files.
The results look like this
A, C and D are the column names that I entered into the list in the main program. The 3 Excel files that I used as input(attached) have different combinations of columns from A to E in them. They were initially in a subdirectory called "C:\Temp\Alteryx\Combine Different Schema based on output tempalte"
Sorry about the mix up there. When I created the .yxzp package, alteryx changed the file paths in the macro from "c:\temp\Alteryx..." to "_externals\1\...". However the Action tool tried to replace the specific string "c:\temp\alteryx..." which wasn't there any more.
Here's a new version which should be fixed. The results with your files look like this
Wow Danil this is helpful I feel like you have been doing this a long time haha please excuse my lack of knowledge so far. What adjustments would I need to make to account for multiple workbooks with multiple sheets. I know the name of the sheet from which I want the data to come "Details" but there are also other sheets within the workbooks which I do not want to access.