This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 05-09-2016 08:27 AM - edited 4 weeks ago by kleinpm7
"I have a bajillion files that I need to bring into my workflow... wayyyyyy too many for me to use individual Input Data tools! What's the best way to do this?"
- every Alteryx user ever
Bringing in multiple files at once can sometimes seem like a daunting task, but is often a necessary piece in developing those wonderful time-saving workflows. Here's a handy flowchart to help you get those tricky files in without a hitch!
1. Input Data with wildcard
Most of the time you just need an Input Data tool to get your data in, which is great, because it's by far the easiest and fastest method here.
Requirements:
How to do it:
So in our flowchart above, this connection string will look in the directory C:\Users\akoszycki\Desktop\Input Data\Data to Input and then pick up every file that starts with "CO Store File - " and ends with ".xlsx".
Results:
2. Dynamic Input with file paths
Sometimes the universe conspires against you and you can't use a wildcard. Why? Well maybe it's impossible to have all of your files located in the same directory, or maybe there are a ton of files in the directory that you don't want a wildcard to pick up. In this case you'll need to use a list of filepaths to feed a Dynamic Input.
Requirements:
How to do it:
Some file types require a table or sheet specification, such as Microsoft Excel or Access database files. If the sheet or table name is always going to be the same, you won't have a problem with the Dynamic Input tool. However, if your files have different tables or sheets that you're pulling from, you may get an error like...
If this happens, you'll have to add the table/sheet to the end of the file path in the form ‹file path›|||‹sheet/table›. For example, in the flowchart above I'm specifying "Sheet1" as the worksheet to import from the Excel files by using a simple formula:[FullPath]+"|||Sheet1$".
3. Batch Macro with file paths
No matter what you do, it seems you always get these pesky messages that keep your files from coming in. This is when we bring out the big guns - Batch Macros. Though this last method is the most work-intensive initially, let's remember to keep in mind what my good buddy Confucius always says...
The batch macro will run a workflow one time for each file, updating the Input Data tool each time. It would generally look like this:
The Batch Macro will automatically read in files which have the same field schema without errors. But what happens if your field schema is different from one file to another?
This is where the batch macro really comes into its own! Navigate to the Interface Designer (View > Interface Designer) output mode , and either use ‘Auto Configure by Name (Wait until All iterations Run)’ or ‘Auto Configure by Position (Wait Until All Iterations Run)’. 'Auto Configure by Name' will union the fields with the same names, and 'Auto Configure by Position' will union field 1 from file 1 on top of field 1 from file 2 and so on for every file.
Once you have configured the Interface Designer you can save the macro and insert it into your workflow (Right click on the canvas Insert > Macro...). There is an example workflow attached which includes dummy files and the Batch Macro to see how the process works.
That should do it for inputting your data! Comment below or reach out to Customer Support with any questions!
This is definitely one of the more common questions. Thank you for laying out the options in this very clear flowchart - I'm definitely bookmarking it!
I've been looking at all sorts of posts on this process, and your's definitely has been most helpful. However, I just posted my own question as I'm just not able to get to where I'm trying to go.
I have 5 files that are all different in structure (different columns in each with some ID fields that connect some of the files), so I'm thinking based on your writeup that I need to use the Macro option, however, when I open the Interface Designer, I get no actionable objects, just a greyed out area. I can't seem to find how to activate this area to do anything with, so I'm not sure what step I'm missing.
My end goal is to import a set of 5 files and run a workflow against them. I would need to repeat this same workflow for the same set of 5 files from each customer, but of course, each customer will send the files slightly different - different file names, different formats (xls, txt, csv, etc), different headers (or no headers), different column orders, some include more than the bare minimum we require. I'm looking for the most streamlined way of running the same data analysis workflow against the set of files from each customer.
UPDATE
Not sure what combination I went through, but I got the interface designer panel to operate and created the macro workflow, so I think I'm making progress on that front.
Hi @serendipitytech! It sound's like you're off to a good start. If you have trouble with this and would like some help, let us know over in Customer Support (support@alteryx.com) and we'll be happy to help.
this was pretty helpful..one question...if i have multiple sheets like Data1, Data2,Data3 and i need to import only Data2 from all files...could be some shortcut i am missing ?
Hey @jishnusyaman - yeah so you could just use option #1 for this.
Use the wildcard to read in all the files you need, and just point at the specific sheet in 'Table or Query'
Hello,
Am trying to use this workflow to solve a problem where I need to union all text (tab delimited) files that has different schema. As am very new to this product am not able to understand how to configure Place holder file input in the batch macro setup.
info:
total files - 11 txt files
total records - 490
format 1 - 4 files - 57 columns
format 2 - 7 files - 56 columns
difference - same schema but one additional column.
Problem - At any given point we never know when the extra columns will append to which file.
Solution - we need only identical columns as output.
For example,
The above is the macro i recreated. While using the macro in another WF the loop fails to pick the other files specified in the directory as in the below
.
The total records in the 11 file are 490 but the WF gets out only 176 records post processing. I look for some checkpoints that I need to follow to get all the records as output.
Thanks in advance. looking forward.
Any update on on this friends?
Yes I did, not luck.
Thanks @andre347.
Yes, I didn't get any errors as such but warning stating that the files was not readable.
I was not sure of which action type to pick to update the file path as there was only limited examples in the Tool Mastery | Action section that is available in the forum. But after seeing you question I was able to play around and set the action type as - Update Input Action Tool and now its able to read all the files as expected. am getting to know the various usage of the options within each tool.
Thanks mate!
Hi, thank you. I have 20 csv files with different names. How can i create one file out of these 20? can you please help with the workflow? thanks
@sriniprad08 put them in the same folder. Load in 1 file. Remove the filename and replace it with a *. This should load an union all the files. Unless they have different column headers.
Hi All,
I have set of text files in a folder with SQL codes in them which are updated whenever there is a need. These text files generate values that needs to be captured in the output tool. New text files get added very often to this folder.
Could someone please advise on how to incrementally access each of these files, run the query in dynamic input tool and capture the output and move on to the next file and repeat?
Regards,
Praj
UPDATE - I retried the below process from beginning and it worked. Not sure what I was doing wrong the first time
Hi,
I have got the batch macro idea to work with my own data, however, I want to make the following changes:
Start Data Import on Line: 5 (rather than line 1)
I would like these steps to occur before the data is outputted to my workflow.
For #1, I made the change within Batch Correct.yxmc in the Input Data tool (the placeholder file)
For #2, I added a Select tool after the input data tool but before the Macro output tool.
See below screenshot.
I then saved the Macro and inserted it into my workflow, however neither change worked.
Does anyone have any ideas?
Can anyone guide me to a solution for my Input data connection issue?
Seems I can not get the program to connect to a .tab file on a common drive.
Strangely, It connects fine in access as an excel linked file.
Error:
Input Data (2) N:\TechOps\Component Mtc\Operational Support & Planning\Production Control\Complete OSR.tab could not be opened as a MapInfo dataset.
Thank you!
Hello,
First off I am new to Alteryx so there is a lot of room for user error here.
I am having problems with a very similar workflow that this thread describes. I migrated about 100 .xlsx files to one folder to simplify the process. They have varying number of tabs but they all have the same one tab of interest with the same name. I am attempting to use the batch macro to combine the one tab from all files in the same folder. Unfortunately the wildcard option does not work seemingly bc I cannot effectively identify the tab name for each file, again it is the same name for all.
Using the batch import option identified above I receive an error, "tab name" does not match a sheet or named range... the Excel file is corrupt, or similar errors depending on configuration. As for configuration I have tried auto configure by position and name (either should be sufficient for the data of interest but not for all data in the tabs), I selected the file path with a wild card for the place holder as I didn't see any other viable option, I have tried update data tool and update value for action config., directory and file path options for the macro itself, and I selected the folder path for the directory input.
I am not certain where each of the unique elements should be defined in Alteryx so any and all help is appreciated. I have done many searches on this topic which has gotten me to this point.
Cheers and thanks for your help.
I'll be off and running with edits once I can access all data in one venue.
Follow up on my previous issue posted above. I found the link below to connect the dots for me and am off and running with the batch macro. Overall it was very useful noting where setting are and are not required for all tools used. Ultimately clarifying the action tool settings in the macro and the filter tool settings using the macro is what brought everything home for me.
Cheers
This doesn't want to work for me, hitting the following snags:
Can't figure out what I'm overlooking, but any insight would be greatly appreciated.
Thanks all for sharing. It really helped but I'm not yet there.
My objective is to import multiple (different) .csv files into tables in Microsoft SQL Server. Since it concerns different files; all files have different schemas. With your help I can succesfully import each file in a seperate table in SQL server. The issue is however that for each imported table it considers the appended schema of all files. Refer to below example:
Files
File A
ID | Text |
1 | 3 |
2 | 4 |
File B
Customer | Name |
ABC | ABCDEF |
123 | 123456 |
Output
Table A in SQL
ID | Text | Customer | Name |
1 | 3 |
|
|
2 | 4 |
|
|
Table B in SQL
ID | Text | Customer | Name |
|
| ABC | ABCDEF |
|
| 123 | 123456 |
Desired Output
Table A in SQL
ID | Text |
1 | 3 |
2 | 4 |
Table B in SQL
Customer | Name |
ABC | ABCDEF |
123 | 123456 |
Hi All,
First time to use macro on a workflow. Just want to check if we're going to publish the workflow on a Alteryx Co-lab gallery, do I also need to publish the macro file? or I just need to run it on my Alteryx Desktop once and just publish the workflow?
thanks,
Rob
I have tried all these methods and worked fine if all files contains only one sheet. Is there any way to handle the the files with multiple sheets?
Thanks in Advance
Can I use a question mark instead of a wildcard, ie "Directory\?.xlsx" instead of "Directory\*.xlsx" to read multiple files?
I believe I read that somewhere but can't find out where.
Actually the asterisk and question mark are old DOS based wild cards so you could google on "DOS wildcards" to get a full explanation. The question mark indicates a wildcard for a single character.
So "Directory\201?.xlsx" will find...
2019.xlsx
2018.xlsx
2017.xlsx
but not
2009.xlsx
I was having the same issue as @Bobe, where the batch macro I added to a workflow was only displaying results from one file and in my case only one record. Once I added a tool (e.g. Browse Tool) to the output anchor of the macro, it ran multiple iteration of the macro and displayed all of the results.
Thank you, this is great!
Great stuff, thank you for the content, very helpful!
Does this imports xlsb file ?
real time and interesting pieces of Q&A. helpful. A must read pieces of information.
Thanks!
Great. This is helpful
Hi, I'm having similar issues. My files are not in my Computer, they are in Azure Data Lake in a Folder and updated hourly as JSON. How can I access this folder in Alteryx instead of reading the files individuallly?
I have downloaded COVID -19 data which has JSON and CSV files And i am trying to import the file suing the Macro input but i am not able to do so . Pls let me know if i am doing correct if not ,pls suggest me the solutions of how to import the different files in Alteryx.
I suspect you'd figured this out some time ago, but for the convenience of those who might have the same question, what you need to do is to include the step writing the table inside the macro. In that way, you'll have the table match the schema of the specific data file used in that iteration of the macro.
Awesome guide!
I am having problems with this too and receiving an error message based on the picture below. The error message keeps popping up concerning row 296 of the data.
Without the workflow or data, it's difficult to say what the problem is here, but I would certainly try it without the Data Cleansing tool. You might also try the Virtual Solution Center if you don't get the help you seek here.
Thank you - will be useful!
Simply....excellent! as simple as that, love this kind of contributions, makes my feel pride about this discipline. Thank you.
such a life saver! thanks!!
Makes sense!
This will be a useful took moving forward.
Thank you!! This was very helpful!
How do i combine multiple excel files with different schema and each excel file has multiple sheets?
@AlexKo thanks for sharing!
Wonderfull explanation I understand everything let me try this process to my end thank @AlexKo for simple explanation.
Thanks,
Kaustubh
Hi,
Great macro thanks for building this. One issue i am facing. when i share my workflow with others there is an error in the path of the macro. Every time we have to reconnect it with the downloaded macro. Any reason for this.
@amitagarwal2185 Depending on how many tabs, I would just do the applicable steps form above and then union the results together. Otherwise you would probably have to do a "macro within a macro" so to say to merge the sheets within the macro and then join the files in the original workflow...hopefully this helps!
@sriniprad08 When you share your workflow with others, are you packaging it first?
@LisaL : Thank you for the reply. Sorry i didn't understand your question.
I generally send the workflow which has the macro to my colleague. But when he opens it he finds the question mark.