on 02-17-2016 03:42 PM - edited on 08-23-2023 03:45 AM by LloydMendonca
Reading in multiple files with different field schemas
This article is part oftheCSMacroDevelopment Series.Thegoal of this series is to communicate tips, tricks, andthethought process that goes into developing good, dynamicmacros.
The Directory tool can be very powerful when reading in multiple files from a folder, either on your local drive, or network location. Once you have the list of file paths you can use the dynamictool or create a batch macro to read all of thesefiles in.
The Dynamic Input tool can be used if all your files have the same field schema.
However, if the field schema's differyou will receive the warnings below when using the Dynamic Input tool to read in the full file paths.
As a result, you can create a simple batch macro which can be used repeatably to bring in multiple files at one time.
Inside the Macro -The first step is to create your workflow.
1)Bring in an input tool and select one of the files you wish to read in. In this example I use .xlsx files, but you can choose whatever file format youwish to read in, however, it will need to be the same file format for all files you are reading in.
2) Add a control parameter above your input tool. This allows you to pass one file path at a time from outside your macro. There is no configuration needed on this tool.
3) Drag from the 'Q' of the control parameter down into the lightning bolt of the input tool. This will add in an action tool. In the Action tool highlight the file path in the configuration window.
This will now add it into the 'replace a specific string' option at the bottom of the configuration window. All this means is that the value you pass through your control parameter will replace this in the input tool.
4) Add a macro output to the Input tool. This will allow the data to flow out of the macro back into your original workflow.
Now got to View>>Interface Designer (ID) and click on thecog icon on the left hand side of the ID window.
In the output mode change the option to:
- Auto configure by Name (Wait until all iterations Run) -Auto Configure by Name will union the fields with the same names
- Auto Configure by Position (Wait Until All Iterations Run) - Auto configure by position will union field 1 etc. in the same position for every file read in.
This will now remove the warnings you were getting with the dynamic input.
Inserting the macro into your workflow
1) File>>Save As - Save the macro in a location where you can access it. Alteryx will know this is a macro workflow because you have added a control parameter, as well as a macro output.
2) In a new canvas you can then right click on the canvas>>insert>>>Macro).
3) If you are reading in a list of files you can then use the directory tool to access the folder with the files in it.
4) As this is an xlsx example I will need to add the sheet name for each of the files. If you are reading in a different file type you will not have to do this step.
5) You can then configure the control parameter and select 'Full path'.
Attached is an example workflow (Version 11.0) which you can use as a template to build off.
If you would like to build a batch macro there is an excellent video on our demand training web page: http://www.alteryx.com/on-demand-training
*Although this macro has been tested it may not work in all scenarios. Please comment on the thread and I would be happy to assist.
Best,
Jordan Barker
Client ServiceRepresentative
HI @jusromero
I am not sure on the frequency of the input files you receive, however would the following work?
Could you just process the latest file (or files you haven't processed yet) and then use the append option in the output tool to the excel file?
This way the workflow will not take 6/7 hours each time, but just the first time (It seems you have already done this as well)?
Best,
Jordan
Hi @JordanB,
Thanks so much for the reply! That's an option. But the problem is, the criteria will also change every run so I was thinking if there's a way to get straight to running the criteria workflow and leave the workflows for the batch input not running. Thanks!
Regards,
Justin
Hi @jusromero
How does the criteria change each run? Does this mean the business logic in the workflow also changes?
Is the criteria influenced by whats in the file? If so, does the file name give anything away about the business logic?
Best,
Jordan
Hi Jordan,
I have another mapping table (which I use the join against the batch input) which changes every run. I was thinking of putting the workflow of the batch input into a container but when I join it in my mapping table, I am receiving an error of "Both Left and Right must have connections." I was thinking if I missed something of doing an output or anything.
Hi @jusromero
I assume the error is because the name you are joining on is changing with the new mapping file. Is this correct?
Would you be able to mock up a few files which I could use to look at a solution?
Best,
Jordan
Hi @JordanB,
I am new to alteryx. I have same requirement where I have multiple xls files with different schema and output is multiple csv files. I used this macro, but the macro is running for multiple iteration. For example if I have 4 xls files, macro is running for 4 iterations so the output rows are duplicating. Also I am getting warning of different schema and different number of fields than 1st file and file will be skipped.
How can I configure macro so that it should run for 1 iteration??
And also suggest about different schema and different fields.
I am using batch macro just as above.
Hi all. I have been trying to use a batch macro to compile CSV files with the same schema. I created a macro and workflow which mirrors the example in the first post exactly. However, once I run the workflow it only compiles the file I list in the macro 30 times one on top of each other instead of compiling all the files from the Directory tool I direct it to. I have pouring through the posts but can't find a fix anywhere.
You will have to update the macro in this workflow as it is configured for .xlsx not for .xls
To do so, please open the macro (Blue tool icon) by right clicking>>open macro
Inside, update the input tool to your .xls input file and select the sheet you want
Once you have done this, click on the action tool above the input tool. In here you need to make sure that your file path is selected and lands in the 'replace a specific string box'
If the 'replace a specific string box' is already checked, uncheck the box, click on your file path and then check the box. You should now see your file path in the box.
Once your file path is in this box you can save the macro by file>>save As>> give macro a name
Back in the main workflow (where the directory and formula tool are), you can now insert the macro by right clicking>>Insert macro
Attach the macro to the formula tool and in the dropdown in the interface select 'full path'
Before running, make sure you have updated the formula tool if you sheets are not named 'Sheet1'
Best,
Jordan
Please follow my suggestion above about ensuring the input file inside the macro is updated for a csv. & you have updated the action tool
Best,
Jordan
Hello! Is there a way to compile the multiple spreadsheets using the headers? Each time I run my macro, I now have headers as F1, F2, F3, etc.
Hi @amberdimp1
Inside of the batch macro you can develop logic to rename your field headers which will then be applied to each file.
Within the Input tool inside of the batch macro you can use the start Data Import on line:
Another option is to use filter or sampling logic in combination with the dynamic rename to get the correct header format
You can then save the macro and repeat that process for each file.
Best,
Jordan
You are a rockstar!!
Is there a way to maintain the file name as a column in my main workflow? When I use the macro to input the files I am losing the file name field.
Dear Community:
I applied this macro and workflow to a directory with some .csv-files. However, if I run the workflow I still get the message that several files differ in the number of fields compared to the first file and those files are skipped. As the macro should circumvent this, I wonder what is going wrong?
Best regards,
Florian
----
After I've re-installed the macro and the workflow and configurated both again, it works now.
Hi @fmb
Can you confirm this is working after you re-installed. Happy to help if not!
Please make sure you updated the macro (configure input tool to .csv format) and re-saved the macro. You can insert the macro into your master workflow with the directory tool
Best,
Jordan
This is awesome! It worked really well for me. Thank you so much!
Dear frnds,
I am new to alteryx , much help will be appreciated.
Problem : I am attaching sample of my input files .. I have 100 Plus such files ..
format remain same for each file but headers differs. We can skip first 5 rows and make the 6 th line as header .
Desired output : I am attaching the sample output .
Since the header in all the files remains between jan to Dec , I would like to combine all the files into one by populating the columns for respective headers .
I have 100 Plus files so I would like to automatically read the files from the folder .
Please help me to provide the solution for this .
Thank a lot
@JordanBNeed to study this. Just hit with a schema error during a batch import using dynamic input.
Is Alteryx working on a tool that would automatically fix this issue?
The dynamic input is set to error if the files do not have the same field schema. This article describes how to build your own equivalent of the dynamic input which makes it dynamic and not error. Please refer to the above article and let me know if your error persists.
Best,
Jordan
Hello Jordan. I had the directory tool combined with dynamic input in my model working just fine until discovered the schema error.
Connected parameter to the Directory. Or should it be connected to Dynamic Input? Then added the macro output.
Can you take a look at my workflow. By adding the macro, it has changed the app interface and I can no longer launch the app.
Appreciate the assistance. Amazing how small items can pose large road blocks. Good thing is I retain newly learned tasks.
@JordanB OK. Finally managed to create and insert macro. Now getting an error in Dynamic Input. Previously did not have a Dynamic Input Error. Any recommendations?
Finally figured out my errors. No further assistance needed.
hi Jordon
Thanks for your post! It is very helpful and I built one WF similar to yours to read multiple excel files with difference schema.
I got no errors message but warnings that some files are being skipped. I don't understand why as I have set the setting like yours to deal with different schema.
Would be appreciated if you can help me out on this one.
Regards
Emma
@EmmaPeng within the interface designer and final option on the left hand side in the window that pops out have you selected 'Auto configure by name'?
The image is at the later stages of my article above
Best,
Jordan
@JordanB yes i have tried select this option as well as configure by position, both options will only process part of the files not all.
I'm using two instances of a batch union macro due to different needs based on inputs. In some runs, there is nothing to open for one of the batch uploaders. Is there a way to prevent that from erroring out? It's a predictable and acceptable outcome. Any ideas?
Dear Jordan,
thank you very much for this article. I had an issue with
"Reading in Multiple Files with Different Field Schema"
Your article was very helpful to build a batch macro and then integrate it into my original workflow.
Hello Jordan, amazing Macro you have here.
Currently, I am trying to convert a directory of reports through a dynamic input to then upload to Tableau. The issue with it is I tried the macro and the error "A record was created with no fields."
I have the sheet name attached to the file and the batch hasn't run any errors, however they are .xls files so I am not sure if that has anything to do with it. The formula is [FullPath] + '|||DailyUtilizationReport$' so that it has the full file path name as well. Also, does the reference file need to come from the same exact directory in order for it to work?
What data comes out of the third blue tool you have in the workflow above? This should actually be the data from your files.
If you don't see any data that's because the macro is made for xlsx. To adjust please right click on the blue macro>open macro. Update the green input tool for one of your .xls files and then update the action tool to reflect your file path in the replace a specific string at the bottom of the configuration window when selected on the action tool. Then go file>save and this will update the workflow above to read the .xls file.
Alternatively, check out @CameronS article where his macro will do this automatically for you.
Best,
Jordan
This is awesome and exactly what I was looking for. Thanks so much!
Inside the macro and in the input tool configuration you can select to read data from a certain line in the .xlsx format.
You can then save the macro and this will read data from line 3 every time
Best,
Jordan
Fantastic example, this has helped me a lot. I've used batch macros before, but I don't understand what type of macro this is. Normally there's a dropdown box that lets you select standard, batch, iterative etc. But in this case it's missing (it should appear in the red circle):
Does anyone know what kind of macro Jordan's one is? I've adapted it to batch input a bunch of shapefiles which, although they had identical field names, had different field lengths so it was throwing a schema error.
If I could understand what kind of macro this is, and where the unioning of input files takes place (in the calling workflow, or in the macro?) that would help a lot.
Hi JordanB
Can we use this macro for google sheets too by changing the input tool to google sheets input tool. I have multiple google sheets with names like Sprint 1 Deployment tracker,Sprint 2 Deployment tracker. for now i have a workflow which is working but i am inputting each sheet separately with a different node. instead i want to use batch macro to do this dynamically. is this possible, as each sheet will have a different workbook id and worksheet id. Once i read the sheets from my google drive i need merge them and add to a new google sheet.
@SandeepRatakonda yes this is definitely possible. Just replace the input tool for the GS connector. In the action tool use the '+' buttons to find the name of the sheet and then update that in the 'update a specific string'.
Best,
Jordan
Thanks for this Jordan!
I have a somewhat similar situation but am quite new to Altryex and trying to sort it out and think this is a good start but think a few pieces may be missing. I have multiple files. The files have different schema and varying column names. I am trying to combine all the files together though and wondering with varying schema and varying column names how I would go about doing this? I do not want to get rid of any columns just want to consolidate all columns with same column titles but again they all have a different Schema
@kspence703 you might find this useful, I asked the same question a few years ago:
As you can see it took a little bit of jiggling before I got it right. I adapted the macro on the page above for csv files instead of xlsx. I attach the macro I used, along with a screenshot of how to set it up. Good luck!
Correction - I can't find the button to attach the workflow or macro. I can only attach photos or videos to this message. Not sure what's gone wrong here?!?
Hi
Is there anyway to use this Macro where the xls files have different Sheet names?
Essentially I have a folder of approx 230 xls files where they have different schema, no of sheets and different sheet names
A number share the same schema
So what I would like to do is read them all into one file eventually which has the same 'default' schema
I have a separate file that has then file names and a code applied to identify a similar group of files with the same schema that I might be able to use in a workflow once I have read in the files
So will your batch process work if amended to allow variant worksheet names?
Thank you
I'm sure this can be done, because the input tool gives you the option to give you a list of sheet names instead of the data. Perhaps it would be better to post on the forum (this isn't actually the forum) and attach a couple of anonymised example files showing the state your tabs and headers are in. Someone is bound to pick it up there.
Hi @JordanB ,
First off, thanks for sharing this macro. However, I'm facing an issue when I try to use it. I tested it by reading a directory with 3 text files, and what this macro is doing for me is that it is duplicating the output by reading and outputting the 3 files 3 times each, meaning the output rows are being duplicated 2 times more. To illusttrate, the output looks something like the below. Do you know how to resolve this so that each file in the directory is only read once?
Filename | Record |
file1 | A |
file2 | B |
file3 | C |
file1 | A |
file2 | B |
file3 | C |
file1 | A |
file2 | B |
file3 | C |
Hi @timothyyeo,
Have you updated the action tool inside of macro to reflect one of your txt file inputs?
Please make sure only three files names are passed into the macro.
Best,
Jordan
Hi @JordanB , is there any way for the dynamic input tool to pick up file names for file types with .pdf?
A problem I'm having with this is that the connection after the macro is starting from the first row, which in my Excel files are descriptive and don't have the actual header names. In fact, in my case, I have some data in rows 4-7 column 2, then 4-5 column 5 and the rest of the data - the main dataset - is from 13-80+ with the headers being on row 13 for that data and the headers for the horizontal data in rows 4-7 being in columns 1 and 4. I've figured out a way to get around that first issue of the horizontal data by using the Append Fields tool and using Create New Column with Filter and Select tools to merge the data from those horizontal fields with the rest coming in via a second input of the same file, but when using this macro, there's no apparent way to start the import on a certain row or specify that the first row contains data, so I'm stuck with what Alteryx picks up as the headers.
This macro contains and Input tool where you can specify which row to start the headers on. If you open the macro (right click: open macro) you can adjust the Input tool.
DanM
Hi Jordan,
Thanks a mill for this - it's really helpful. I've got a batch macro configured the same way as your example, but when I run it for some reason it starts creating a lot of null values in the first sheet it's bringing in, and doesn't output the other files in the output at all, so I end up with mostly null values in the output. Do you have any idea why this would be happening and how to fix?
Is there a way to disable the in-browse default in the batch macro? I only see the first row of data from the first file when inserting the macro into my standard workflow. Once I add a browse, I can then sequence tools after the macro to perform additional analysis, but I'd rather not include the browse tool. Is this possible?
@JordanB Is there a way to incorporate "List of Sheet Names" into this macro? I have files where the tabs are named according to the date they are created. If I can produce a list based on the range of dates these files were created, would I be able to use it in the macro?
Is there a more simple way to do this? this increases my compute time by 10x doing this. I have 2000 files with random schemas but have common columns. tableau seems to handle this a lot easier as in there's no setup just tell it pull in all files in a folder and it matches them up. Can I run SQL on that folder or something?