community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Knowledge Base

Definitive answers from Designer experts.

The Ultimate Input Data Flowchart

Alteryx
Alteryx
Created on

"I have a bajillion files that I need to bring into my workflow... wayyyyyy too many for me to use individual Input Data tools! What's the best way to do this?"

                           - every Alteryx user ever

 

Bringing in multiple files at once can sometimes seem like a daunting task, but is often a necessary piece in developing those wonderful time saving workflows. Here's a handy flowchart to help you get those tricky files in without a hitch!

 

flowchart.PNG

 

1. Input Data with wildcard

 

Most of the time you just need an Input Data tool to get your data in, which is great, because it's by far the easiest and fastest method here.

 

kiss.PNG

 

Requirements:

  • You need all of your files to be in the same directory.
  • All files must be the same type (ex. csv, xlsx, yxdb)

How to do it:

  1. Configure an Input Data tool to bring in one of the files in your directory.
  2. In the connection string, replace the filename (or a portion of the filename) with an asterisk ( ). The asterisk is a wildcard that essentially takes the place of anything.

input1.JPG

 

So in our flowchart above, this connection string will look in the directory C:\Users\akoszycki\Desktop\Input Data\Data to Input and then pick up every file that starts with "CO Store File - " and ends with ".xlsx".

 

Results:

  • All of the data from these files is stacked on top of each other, essentially the same way as the Union tool stacks data when it's using the "Auto Config by Name" option.
  • If you want to see the file from which the data comes from, use the "Output File Name as Field" option in your Input Data tool configuration.

file name.PNG

 

macros2.jpg

 

2. Dynamic Input with file paths

 

Sometimes the universe conspires against you and you can't use a wildcard. Why? Well maybe it's impossible to have all of your files located in the same directory, or maybe there are a ton of files in the directory that you don't want a wildcard to pick up. In this case you'll need to use a list of filepaths to feed a Dynamic Input.

 

Requirements:

 

  • A list of filepaths to import.     fullpths.PNG
  • A file to populate the "Input Data Source Template" configuration of the Dynamic Input tool. This file will always have to be available for the tool to base it's field schema on, and can be any of the files you wish to input. It *will not* be input if it is missing from the aforementioned file list.

How to do it:

  1. Configure the Input Data Source Template in the Dynamic Input just as if it were a regular Input Data tool.
  2. This tool includes an option to "Read a List of Data Sources" that will update the connection string based on your list of filenames.

 

macros3.jpg

 

 

 

Working with tables:

Some file types require a table or sheet specification, such as Microsoft Excel or Access database files. If the sheet or table name is always going to be the same, you won't have a problem with the Dynamic Input tool. However, if your files have different tables or sheets that you're pulling from, you may get an error like...

sheetname error.PNG

 

If this happens, you'll have to add the table/sheet to the end of the file path in the form <file path>|||<sheet/table>. For example, in the flowchart above I'm specifying "Sheet1" as the worksheet to import in the Excel files by using a simple formula: [FullPath]+"|||Sheet1$".

 

3. Batch Macro with file paths

 

No matter what you do, it seems you always get these pesky messages that keep your files from coming in. This is when we bring out the big guns - Batch Macros. Though this last method is technically the most work-intensive, let's remember to keep in mind what my good buddy Confucius always says...

 

1000steps.jpg

 

The batch macro will run a workflow one time for each file, updating the Input Data tool each time. It would generally look like this:

 

batch.PNG

 

The Batch Macro will automatically read in files which have the same field schema without errors. But what happens if your field schema is different from one file to another?

 

This is where the batch macro really comes into its own! Navigate to the Interface Designer (View>>Interface Designer) output mode PIc 2.png, and either use ‘Auto Configure by Name (Wait until All iterations Run)’ or ‘Auto Configure by Position (Wait Until All Iterations Run)’. 'Auto Configure by Name' will union the fields with the same names, and 'Auto Configure by Position' will union field 1 from file 1 on top of field 1 from file 2 for every file.

 

macros4.jpg

 

Once you have configured the Interface Designer you can save the macro and insert it into your workflow (Right click on the canvas>>insert>>Macros). There is an example workflow attached which includes dummy files and the Batch Macro to see how the process works. 

***This workflow was built in Alteryx Designer 10.5.

 

 

That should do it for inputting your data! Comment below or reach out to Customer Support with any questions!

Comments
Quasar
Quasar

This is definitely one of the more common questions.  Thank you for laying out the options in this very clear flowchart - I'm definitely bookmarking it!

I've been looking at all sorts of posts on this process, and your's definitely has been most helpful. However, I just posted my own question as I'm just not able to get to where I'm trying to go. 

 

I have 5 files that are all different in structure (different columns in each with some ID fields that connect some of the files), so I'm thinking based on your writeup that I need to use the Macro option, however, when I open the Interface Designer, I get no actionable objects, just a greyed out area. I can't seem to find how to activate this area to do anything with, so I'm not sure what step I'm missing. 

 

My end goal is to import a set of 5 files and run a workflow against them. I would need to repeat this same workflow for the same set of 5 files from each customer, but of course, each customer will send the files slightly different - different file names, different formats (xls, txt, csv, etc), different headers (or no headers), different column orders, some include more than the bare minimum we require. I'm looking for the most streamlined way of running the same data analysis workflow against the set of files from each customer. 

 

UPDATE

Not sure what combination I went through, but I got the interface designer panel to operate and created the macro workflow, so I think I'm making progress on that front. 

Alteryx
Alteryx

Hi @serendipitytech! It sound's like you're off to a good start. If you have trouble with this and would like some help, let us know over in Customer Support (support@alteryx.com) and we'll be happy to help.

this was pretty helpful..one question...if i have multiple sheets like Data1, Data2,Data3 and i need to import only Data2 from all files...could be some shortcut i am missing ?

Alteryx
Alteryx

Hey @jishnusyaman - yeah so you could just use option #1 for this.

 

wildcard.PNG

 

 Use the wildcard to read in all the files you need, and just point at the specific sheet in 'Table or Query'

Asteroid

 Hello, 

 

Am trying to use this workflow to solve a problem where I need to union all text (tab delimited)  files that has different schema. As am very new to this product am not able to understand how to configure Place holder file input in the batch macro setup. 

 

info: 

total files - 11 txt files 

total records - 490 

format 1 - 4 files - 57 columns 

format 2 - 7 files - 56 columns

difference - same schema but one additional column. 

Problem - At any given point we never know when the extra columns will append to which file. 

Solution - we need only identical columns as output. 

 

For example, 

macro 1.PNG

 

The above is the macro i recreated. While using the macro in another WF the loop fails to pick the other files specified in the directory as in the below 

 

macro 2.PNG

 

The total records in the 11 file are 490 but the WF gets out only 176 records post processing. I look for some checkpoints that I need to follow to get all the records as output. 

 

Thanks in advance. looking forward. 

Asteroid

Any update on on this friends? 

Alteryx Certified Partner
Did you change the Interface Designer properties in the macro? You'd need to set that to 'Auto Configure by Name' - this will put nulls in the columns where the files don't have that column.
Asteroid

Yes I did, not luck. 

Alteryx Certified Partner
Can you be a bit more specific regarding what's the error message and what is going wrong? From the screenshots you posted above I can see that in the Action tool (2) you haven't told it what to update. So if you click on 'File' (and it's highlighted in blue) then you are going to do your action on the file. At the moment it's only selected input.
Asteroid

Thanks @andre347

 

Yes, I didn't get any errors as such but warning stating that the files was not readable.

 

I was not sure of which action type to pick to update the file path as there was only limited examples in the Tool Mastery | Action  section that is available in the forum. But after seeing you question I was able to play around and set the action type as - Update Input Action Tool and now its able to read all the files as expected. am getting to know the various usage of the options within each tool. 

 

Thanks mate! 

 

 

 

 

Meteor

Hi, thank you. I have 20 csv files with different names. How can i create one file out of these 20? can you please help with the workflow? thanks

Alteryx Certified Partner

@sriniprad08 put them in the same folder. Load in 1 file. Remove the filename and replace it with a *. This should load an union all the files. Unless they have different column headers. 

Atom

 Hi All,

 

I have set of text files in a folder with SQL codes in them which are updated whenever there is a need. These text files generate values that needs to be captured in the output tool. New text files get added very often to this folder.

 

Could someone please advise on how to incrementally access each of these files, run the query in dynamic input tool and capture the output and move on to the next file and repeat? 

 

Regards,

Praj

Asteroid

UPDATE - I retried the below process from beginning and it worked. Not sure what I was doing wrong the first time

 

 

 

Hi, 

 

I have got the batch macro idea to work with my own data, however, I want to make the following changes:

  1. Start Data Import on Line: 5  (rather than line 1)

  2. Import only a selection of columns. 

I would like these steps to occur before the data is outputted to my workflow. 

 

For #1, I made the change within Batch Correct.yxmc in the Input Data tool (the placeholder file)

For #2, I added a Select tool after the input data tool but before the Macro output tool. 

 

See below screenshot. 

 

I then saved the Macro and inserted it into my workflow, however neither change worked. 

 

Does anyone have any ideas? Alteryx_batch macro.JPG

 

Meteor

Can anyone guide me to a solution for my Input data connection issue?

Seems I can not get the program to connect to a .tab file on a common drive.

 

Strangely, It connects fine in access as an excel linked file.

 

Error:

Input Data (2) N:\TechOps\Component Mtc\Operational Support & Planning\Production Control\Complete OSR.tab could not be opened as a MapInfo dataset.

 

Thank you!

Meteoroid

Hello, 

 

First off I am new to Alteryx so there is a lot of room for user error here.

 

I am having problems with a very similar workflow that this thread describes. I migrated about 100 .xlsx files to one folder to simplify the process. They have varying number of tabs but they all have the same one tab of interest with the same name. I am attempting to use the batch macro to combine the one tab from all files in the same folder. Unfortunately the wildcard option does not work seemingly bc I cannot effectively identify the tab name for each file, again it is the same name for all. 

 

Using the batch import option identified above I receive an error, "tab name" does not match a sheet or named range... the Excel file is corrupt, or similar errors depending on configuration. As for configuration I have tried auto configure by position and name (either should be sufficient for the data of interest but not for all data in the tabs), I selected the file path with a wild card for the place holder as I didn't see any other viable option, I have tried update data tool and update value for action config., directory and file path options for the macro itself, and I selected the folder path for the directory input. 

 

I am not certain where each of the unique elements should be defined in Alteryx so any and all help is appreciated. I have done many searches on this topic which has gotten me to this point. Capture.PNG

 

Cheers and thanks for your help. 

 

I'll be off and running with edits once I can access all data in one venue. 

Meteoroid

Follow up on my previous issue posted above. I found the link below to connect the dots for me and am off and running with the batch macro. Overall it was very useful noting where setting are and are not required for all tools used. Ultimately clarifying the action tool settings in the macro and the filter tool settings using the macro is what brought everything home for me.  

 

https://community.alteryx.com/t5/Alteryx-Knowledge-Base/CS-Macro-Dev-Reading-in-multiple-files-with-...

 

 

Cheers

Meteor

This doesn't want to work for me, hitting the following snags: 

 

  • If I use the macro as described and set the "Full path" as parameter for the macro input, I get an error message stating that I need to specify a sheet
    • I've specified a sheet name in the input tool within the macro, so this error doesn't make sense 

 

  • If I work around the problem above by using a function to append the sheet name to each file path string, the batch macro only iterates one single time and only returns the results from a single file, stating the macro was "(ended by a downstream tool)"
    • I've set the Output Mode of the macro to "Auto Configure by Name (Wait Until All Iterations Run)" and I can see there are >1 records showing up in the input anchor of the macro  

Can't figure out what I'm overlooking, but any insight would be greatly appreciated. 

Atom

Thanks all for sharing. It really helped but I'm not yet there.

 

My objective is to import multiple (different) .csv files into tables in Microsoft SQL Server. Since it concerns different files; all files have different schemas. With your help I can succesfully import each file in a seperate table in SQL server. The issue is however that for each imported table it considers the appended schema of all files. Refer to below example:

 

Files

File A

ID

Text

1

3

2

4


File B

Customer

Name

ABC

ABCDEF

123

123456

 

Output

Table A in SQL

ID

Text

Customer

Name

1

3

 

 

2

4

 

 

 

Table B in SQL

ID

Text

Customer

Name

 

 

ABC

ABCDEF

 

 

123

123456

 

Desired Output

Table A in SQL

ID

Text

1

3

2

4

 

Table B in SQL

Customer

Name

ABC

ABCDEF

123

123456

Meteor

Hi All,

 

First time to use macro on a workflow. Just want to check if we're going to publish the workflow on a Alteryx Co-lab gallery, do I also need to publish the macro file? or I just need to run it on my Alteryx Desktop once and just publish the workflow?

 

 

thanks,
Rob

I have tried all these methods and worked fine if all files contains only one sheet. Is there any way to handle the the files with multiple sheets? 

 

Thanks in Advance

Atom

Can I use a question mark instead of a wildcard, ie "Directory\?.xlsx" instead of "Directory\*.xlsx" to read multiple files?
I believe I read that somewhere but can't find out where.

Alteryx Certified Partner

@jdemeaux 

Actually the asterisk and question mark are old DOS based wild cards so you could google on "DOS wildcards" to get a full explanation. The question mark indicates a wildcard for a single character. 

So "Directory\201?.xlsx" will find...

2019.xlsx

2018.xlsx

2017.xlsx

 

but not

2009.xlsx