Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

CS Macro Dev: Reading in Multiple Files with Different Field Schema - Batch Macro

JordanB
Alteryx
Alteryx
Created

Reading in multiple files with different field schemas

This article is part oftheCSMacroDevelopment Series.Thegoal of this series is to communicate tips, tricks, andthethought process that goes into developing good, dynamicmacros.

The Directory tool can be very powerful when reading in multiple files from a folder, either on your local drive, or network location. Once you have the list of file paths you can use the dynamictool or create a batch macro to read all of thesefiles in.

The Dynamic Input tool can be used if all your files have the same field schema.

However, if the field schema's differyou will receive the warnings below when using the Dynamic Input tool to read in the full file paths.

As a result, you can create a simple batch macro which can be used repeatably to bring in multiple files at one time.

Reading Field schema 1.png

Inside the Macro -The first step is to create your workflow.

1)Bring in an input tool and select one of the files you wish to read in. In this example I use .xlsx files, but you can choose whatever file format youwish to read in, however, it will need to be the same file format for all files you are reading in.

2) Add a control parameter above your input tool. This allows you to pass one file path at a time from outside your macro. There is no configuration needed on this tool.

3) Drag from the 'Q' of the control parameter down into the lightning bolt of the input tool. This will add in an action tool. In the Action tool highlight the file path in the configuration window.

This will now add it into the 'replace a specific string' option at the bottom of the configuration window. All this means is that the value you pass through your control parameter will replace this in the input tool.

4) Add a macro output to the Input tool. This will allow the data to flow out of the macro back into your original workflow.

2018-11-14_12-54-26.png

Now got to View>>Interface Designer (ID) and click on thecog icon on the left hand side of the ID window.

2018-11-14_12-56-42.png

In the output mode change the option to:

- Auto configure by Name (Wait until all iterations Run) -Auto Configure by Name will union the fields with the same names

- Auto Configure by Position (Wait Until All Iterations Run) - Auto configure by position will union field 1 etc. in the same position for every file read in.

2018-11-14_12-57-55.png

This will now remove the warnings you were getting with the dynamic input.

Inserting the macro into your workflow

1) File>>Save As - Save the macro in a location where you can access it. Alteryx will know this is a macro workflow because you have added a control parameter, as well as a macro output.

2) In a new canvas you can then right click on the canvas>>insert>>>Macro).

3) If you are reading in a list of files you can then use the directory tool to access the folder with the files in it.

4) As this is an xlsx example I will need to add the sheet name for each of the files. If you are reading in a different file type you will not have to do this step.

5) You can then configure the control parameter and select 'Full path'.

2018-11-14_13-02-01.png

Attached is an example workflow (Version 11.0) which you can use as a template to build off.

If you would like to build a batch macro there is an excellent video on our demand training web page: http://www.alteryx.com/on-demand-training

*Although this macro has been tested it may not work in all scenarios. Please comment on the thread and I would be happy to assist.

Best,

Jordan Barker

Client ServiceRepresentative

Attachments
Comments
tviz21
8 - Asteroid

Hi Jordan, thanks for the great macro, I've been using successfully for the past couple of weeks but, suddenly this week I'm getting an error message.

 

"No sheet specified, you must specify a sheet"

 

I've checked the macro, and see that I have in fact pointed it to a sheet and have not changed anything since it last worked. All of my settings match what you've indicated.

 

Any ideas as to why this would start happening randomly?

 

Thanks!

 

Toni

 

 

JordanB
Alteryx
Alteryx

Hi Toni, @tviz21

 

From the full paths you are feeding into the batch macro do they have a sheet name appended to the end? Example: 'c\users\desktop\samplefiles.xlsx|Sheet1'

 

If the full paths do not have a sheet name attached to them and the type of file is expecting them i.e. xlsx there will need to be the sheet name added to the end of the full path. If your file types are CSV then a sheet name should not be needed. 

 

I suspect a different file type may have been placed in that folder? 

 

Can you verify and let me know. 

 

Best,

 

Jordan Barker 

tviz21
8 - Asteroid

Here is what the path in my Directory Tool is set up as - is this what you are looking for?

 

DirectoryTool.jpg

Batch Macro set up

BatchMacroSetUp.png

 

Thanks so much!

JordanB
Alteryx
Alteryx

@tviz21 do you have a screenshot of the results from your directory tool? I am looking to see if you have a sheet name such as 'sheet1$' at the end of your full path?

 

Best,

 

Jordan Barker

Solutions Consultant 

tviz21
8 - Asteroid

Here's the result of the directory tool output - when I set up the macro, I'm selecting Sheet1$ but don't see an option to do this in the Directory Tool

 

DirectoryTool_Results.png

 

 

tviz21
8 - Asteroid

Hi,

 

Still can't figure this out, If I use a wildcard *.xlsx* in the Directory Tool the macro errors out with the "need sheet name: error, if I enter a specific filename, TEST.xlsx, it works. I've done everything I can think of to do, even downloaded your macro again to start fresh but no luck. 

 

Do I have the macro configured correctly? I've attached updated screenshots and have the control parameter set to FullPath. 

 

Thank you for your help with this, appreciate it.

 

#1 Macro Setup

#2 Directory Tool setup

#3 Error msg

Batch Macro input.pngDirectory Tool setup.pngBatch Macro error msg.png

 

 

JordanB
Alteryx
Alteryx

Hi @tviz21

 

You will need to use a formula tool and add '|sheet1$' to your full paths. 

 

7-5-2016 10-06-23 AM.png

 

Alteryx needs the sheets names to be read in with the full paths. It cannot dynamically update the 'Table or Query' section in the above screenshot of the input data tool. 

 

So your full paths will need to be altered after your directory tool and the full path field needs to look like....'FILEFOLDER\FILENAME\EXAMPLEFILE.xlsx|sheet1$' 

 

Best,

 

Jordan Barker

Solutions Consultant 

JordanB
Alteryx
Alteryx

Hi @tviz21

 

You will need to use a formula tool and add '|sheet1$' to your full paths. 

 

7-5-2016 10-06-23 AM.png

 

Alteryx needs the sheets names to be read in with the full paths, so it can dynamically update the 'Table or Query' section in the above screenshot of the input data tool. 

 

So your full paths will need to be altered after your directory tool and the full path field needs to look like....'FILEFOLDER\FILENAME\EXAMPLEFILE.xlsx|sheet1$' 

 

Best,

 

Jordan Barker

Solutions Consultant 

jbednarz
6 - Meteoroid

Thanks for the macro, Jordan!  This is exactly what I was looking for, as it enabled me to read a directory of XML files with different schema's and bring them together into 1 single XML stream.

suli
9 - Comet

Hi,

 

I am using Alteryx 10.6 to run this macro. I have 6 similar structure xlsx files to read (they differ with few columns, sheet name is the same). Once I run workflow inside the macro (with 1 file), I get in macro output all records.

Once I place macro in the workflow, only one iteration is run, I get the message:

Info: 1 iterations were run (ended by a downstream tool)

and in output I have oly 1st record of the 1st file.

I do not get any errors or warnings. Do you know what could be the reason?

 

Thanks!

Cameront
5 - Atom

Thanks for the macro.

Worked very well for me. 

JordanB
Alteryx
Alteryx

Hi @suli

 

Please add a browse tool and you will be able to see all the data. 

 

This is an in-browse default when creaing a batch macro.

 

Best,

 

Jordan 

tviz21
8 - Asteroid

batchmacroresult.jpgbatchmacro_gtm.pngHi Jordan,

 

I've been using this macro for the past few months based on what you helped me put together but, for some reason, today I'm getting an error message within the batch macro "Tool#6: No valid fields were selected"

 

Nothing has changed except my template I've referring to which is in the exact same format as previously. The macro runs fine on it's own but not when I run the entire workflow

 

My formula tool reads - [FullPath] + '|Sheet1$'

My select tool is pulling FullPath and FileName

 

Any ideas as to why this would just start happening? Let me know what other info you might need. Thanks!

JordanB
Alteryx
Alteryx

@tviz21

 

I have a feeling it is the crosstab tool in the macro. Can you take a look and see if it is configured correctly.

 

The error is coming from tool 6 so please investigate that tool.

 

Best,

 

Jordan Barker

Solutions Consultant 

serendipitytech
8 - Asteroid

So I'm working with a directory of XML files. I thought I had this working for one set, but now I'm trying to use it again and I'm getting an error:

*file* has a different schema than the 1st file in the set

 

I've got the macro built, changed the option in interface designer to "Auto configure by Name"

 

the error happens when I run the macro as it's own workflow as well as when I run it as part of the larger workflows. I'm kinda stuck as to how to proceed with this?

 

Basically, I'm trying to run a set of XML parsing tools against a long list of XML files, so maybe there is a different method that can be used?

JordanB
Alteryx
Alteryx

Hi @serendipitytech

 

Once you made the change to the macro can you delete it and re-insert the macro into the workflow to make sure all new dependencies are there. 

 

Can you also screenshot the behavior and config in the interface designer to help troubleshoot.

 

Thanks,

 

Jordan Barker

serendipitytech
8 - Asteroid

Thanks @JordanB, 

 

Deleting and reading it seems to have gotten it working again!

PhillP
5 - Atom

Hi Jordan

Paul Noirel and Craig Higgins suggested I get in touch.

I've downloaded the macro but I'm struggling to work out how to configure it.

I'm a bit confused as the workflow in your post above is very different to the workflow within the file I downloaded.

 

I have a problem importing .XLS files from a folder into Alteryx. There are 350 files all on the same folder

I originally used the Import icon and set the 1st file in the folder as the template.

I then used *.xls to import all the files together

Most of the files imported OK. However several files were not imported because the schemas didn’t match the 1st file in the folder (even though as far as I can tell they are exactly the same)

We converted the xls files to xlsx and these imported successfully.

We also checked the Union tool and this worked fine to join two of the xls files, i.e. files that the Input Tool 'said' had different schemas

 

Could you help please?

thanks Phill

PhillP
5 - Atom

Thanks for your help Jordan

Alteryx cannot read XLS files as easy as XLSX files.

The download in your post above contains the components of the batch macro which is why it doesn't look the same as your flow diagram. The circle contains the components.

The control parameter drip feeds the files one at a time through the import tool.

In the action tool we updated the action type to "Update Value (Default)" and selected the "file - value" option in Value or Attribute to update

In the input tool we selected an example file and the worksheet from the import folder. All files in the folder (~350) had the same worksheet name ("Custom Table")

We then opened the interface designer, clicked the spanner and set the output mode to auto configure by name (as per your notes) - this ensures that columns with the same header name in successive files are joined to each other. Any columns with new names are added to the end of the table.

We then create the macro item (Blue circle)

We then created a new workflow - 1st icon was the directory tool and we selected the folder and file type (.XLS) where we wanted to get the files from.

We added a formula tool to append the sheet name onto the end of the full file path (formula: [FullPath]+"|Custom Table$") and loaded this into the batch macro

evilla
7 - Meteor

Hello All,

 

First of all thanks to for posting this excellent solution. I have been using it for a while without any issues. However, one of these days I had to update file locations (IT migration yey!!), so that is exactly what I did. The problem is that for some reason the macro started throwing the "different schema" warning no matter what selection I made in the interface Designer ("Auto configure by Name" or "Auto Configure by Position").

 

So after a few kicks, coffees and prayers it finally occurred to me to select the other option "All Iterations will have the same output schema", run the macro, get the error, go back to the interface Designer and select desired my option again.

 

1st Step.png

 

 

 

 

 

 

 

 

I run the macro again and voilà!! The "different schema" warning went away. 

 

2nd Step.png

 

 

 

 

 

 

 

I know it doesn't make any sense but it worked for me. 

 

Cheers!

JordanB
Alteryx
Alteryx

Hi @evilla

 

Strange one but I suspect this may have been a metadata issue with the change in file location. Good troubleshooting and I'm glad it's back up and running!!

Raghu_s
8 - Asteroid

Hey! 

 

Am new to this community and this is my first post. :)  would like to get some help in using this macro for a slightly different problem. 

 

So, I get  a .zip file that contains 15-20 tab delimited .txt files that has different field schema. Most of the files has similar columns of which 3 to 4 files has additional columns appended. 

 

Requirement: 

1. Ignore the additional columns 

2. at any give point the WF should pick the zip file and process it 

3. Append all the files for further processing. 

 

Problem faced: Am not able to set the zip file as a place holder using the above macro tool or try building myself or find a solution available in the community. 

Any sort of help would be greatly appreciated. 

 

Thanks!! 

JordanB
Alteryx
Alteryx

Hi @Raghu_s

 

I would check out @RodL macro he built to unzip and get a list of the files and then read all of the files into Alteryx.

 

As Rod alluded to this assumes you are using 7zip but it should give you a good indication of how to do this dynamically if not.

 

Best,

 

Jordan Barker

Solutions Consultant

Raghu_s
8 - Asteroid

Thank you @jordanB 

 

I shall look into the resources and get back if I have any questions. 

 

 

-RS

Eduardo_Pavan
6 - Meteoroid

Hello Jordan,

 

I was able to use the macro to merge different files with different number of fields, however one of the steps I need is to save the file name as a column so I can extract the date for each row and when I use the macro as is the column is filled with only one file name. How can I make the macro save each filename for the corresponding rows of each file?

 

Thank you!

evilla
7 - Meteor

Hello Eduardo, you should be able to see the file name as a column field if you make this selection in the Macro input tool.

 

Good luck!

 

macro file.jpg

gucher
5 - Atom

Hi Jordan-

 

I am only getting output files for the Excel sheet i used in the first Macro. The output is working correctly, but only for the single file I used as the input on the first macro. I have tried using other files and only get an output of sheets from the single excel I used as an input.

 

Any ideas?

 

Thanks!

Grant.

JordanB
Alteryx
Alteryx

Hi @gucher

 

Can you send me a picture of your workflow and the inside of the macro? 

 

If you can add a browse to the right hand side of your macro to double check it is showing all the data.

 

Best,

 

Jordan

MRioux
7 - Meteor

2nd workflow2nd workflowMacro workflowMacro workflowHi Jordan, I have built this batch macro per your instructions above but it's only bringing back/reading the first file in my directory and I don't know if I'm missing a step. I want to read 10 xlsx files and in the macro flow I've pointed it to the first file. In the 2nd workflow, do I still need to tell it what sheet name to bring in if I've left the sheet name as Sheet1 in each file? I've tried it both ways and it still only reads the first file.

AlteryxExcelInputProblem.png

Raghu_s
8 - Asteroid

@MRioux Hey, are you trying to read different sheets from those multiple files?

 

Like, 

1. same sheet name across different files 

2. different sheet names across different files 

3. All sheets in a workbook across all files 

4. Does the sheets have same schema? 

 

I think you need to pick the right template( schema)  that is required for picking similar sheets across the files. Did you define it? 

MRioux
7 - Meteor

Thank you for your reply. Yes I'm trying to read 1. same sheet name across different files.

Yes the sheets have the same schema for 2017 (one file per month), which is all I'm trying to bring in for now but prior years might have slightly different schema. I will bring the prior year files in later once I get this macro to work.

 

What do you mean by "did you define it"?

 

Mona

 

 

carolinemcdonald
6 - Meteoroid

Capture 1.PNG

Hi Jordan,

 

This post was VERY helpful and allowed me to get my wildcard input set up.  I've checked the box on the input data tool to output the file name as a field, but it looks like it's just bringing in the name of the first file in the directory for each of the records.  Is there a way to set it up so that the output shows which file the data came from?

 

Thanks,

Caroline

 

JordanB
Alteryx
Alteryx

Hi @carolinemcdonald

 

Have you saved the workflow as a batch macro yet and inserted it into your master workflow? The above workflow is the inside of a batch macro and therefore, will just show one file because you haven't fed in the other files yet.

 

If you have fed all files in can you add a browse tool to make sure you are seeing the full dataset.

 

Best,

 

Jordan

 

carolinemcdonald
6 - Meteoroid

Hi Jordan,

 

I have saved it as a batch macro and inserted it into my workflow, but it's still returning just the first file name in the new file field.  I've added a browse tool and even a field summary tool to ensure I'm getting a full view of the data set.  Any other ideas?

 

Thanks!

Caroline

JordanB
Alteryx
Alteryx

Hi @carolinemcdonald

 

Are you passing in the sheet name as part of the file path?

 

Can you screenshot the action tool config.

 

Best,

 

Jordan

shaynie
8 - Asteroid

Thank you Jordan!  Super helpful!  I'm trying to parse 200,000 xml files, with different fields, am finally making some good progress using this macro approach!

deano
5 - Atom

Hi Jordan,

Changed the sheet name as instructed but am getting a parse error in the Action Tool.   What do you suspect?..................Thanks,

 

Capture.PNG

 

Capture.PNG

 

JordanB
Alteryx
Alteryx

Hi @deano

 

I would use a formula tool outside of the macro to build the file path. I would then switch to 'update value' and highlight the file path.

 

Best,

 

Jordan

asifk
8 - Asteroid

Hi Jordan,

 

This is great it works like it was intended, I have also further developed the macro to read in Multiple sheets from one file with different schema and also write it as multiple excel files or a single file with multiple sheets.

 

This would work great if reading from a google sheet/ drive or FTP.

 

I'd like to share my workflow and hear your thoughts on it.

 

Thanks,

Asif

YKN
5 - Atom

 Hi everyone,

 

First of all, thanks a lot to Jordan for this macro. So far it has been really helpful and I learnt a lot from it. Also thanks to others that contributed to this thread. I'm still rather new to Alteryx, which leaves a lot of room for errors.

 

I'm currently trying to build a batch import macro for different .csv files with different schemas, which imports the files and directly export them into a Microsoft SQL Server database. I am trying to get every .csv file imported into seperate tables with their original schema. The steps i am trying to get automated are:

 

- First import all CSV files (with different schemas) within a pre-defined local directory

- Followed up by an export of all CSV files into seperate tables in a SQL database with the CSV filename as the name of the table.

 

By using the batch macro, I managed to complete the whole process of importing and exporting the data with the correct names. Unfortunately though, the output is not what I was hoping for and I can't seem to find the solution. The flow does export the data into different tables, but somehow it considers the appended schema of all files. Below I try to illustrate the issue:

 

Tables to be imported:

 

Table 1

CompanyCountry
AUK
BDE
CFR

 

Table 2

NameAge
Tim23
Paul41
Sander18

 

Output in SQL:

 

Table 1:

CompanyCountryNameAge
AUK  
BDE  
CFR  

 

Table 2:

CompanyCountryNameAge
  Tim23
  Paul41
  Sander18

 

My desired output is that the tables in SQL have the original schema, without showing columns of other tables besides the original columns. It would be greatly appreciated if anyone has an idea on how to solve this issue and automate the solution, so that this process can be repeated with new files in the same local directory. 

 

Any help is greatly appreciated!

asifk
8 - Asteroid

Hi YKN,

 

I also had a similar problem, you can find the solution here

 

https://community.alteryx.com/t5/Data-Preparation-Blending/Removing-Blank-Fields/td-p/92584

YKN
5 - Atom

Hi asifk,

 

Thanks for the input! I was thinking about a solution like this as well. In many cases, it would probably solve the issue. Though, for my follow-up analysis all columns would be needed, even if they are completely empty. This is mostly because of validation purposes and standard query's that are to be used by myself and others that would be using this tool as well. Also, I would prefer the more "clean" version in which I keep the original scheme from the table to be imported from the beginning, instead of removing blank columns from the appended schema afterwards.

 

Nonetheless, this would be a start for some individual batch imports, but if I want to implement this as a tool within the department I'm working at, I think the clean solution as mentioned is needed.

 

asifk
8 - Asteroid

Hi YKN,

 

That makes a lot of sense and I can see why it would be important to keep to the original formatting and have that feed into SQL Tables (sometimes can be a bit technical)

 

I hope you find the solution you're looking for and I would also like to utilize it.

 

Happy Learning!

 

Regards,

Asif

NicolasLedesma
5 - Atom

Hi @JordanB,

 

I'm using your Macro it works perfectly. Now I just have a problem. My directory is to change every month, how can I tell Alteryx to pick a different directory every month and use your Macro at the same time ?

 

regards 

asifk
8 - Asteroid

Hi

 

A possible solution is to point the directory to a root folder and select "Include Subdirectories" I would assume that your naming convention f the files would remain the same. 

Then the next step would be to add a sort tool and sort by CreationTime or LastWriteTime, this will then keep your newest file at the top

thereafter add a unique tool and remove duplicates.

 

Hope this works

 

regrads,

Asif

JHerzog
7 - Meteor

This is a great thread and has helped immensely to simplify my processing of multiple CSV files with slightly different schemas. 

 

I needed the output from the macro to include the filename as a column from each source file and I did as instructed above by selecting the Output file name as Field: "File Name Only" on the Input tool within the Macro. The first time i ran it for a list of 7 CSV files it worked great. Then I added in a Formula tool into the macro to parse out the text portion of the file name from the date at the end of the file name into 2 new columns. This has causes the "Filename" column to appear in the output of the macro 2 times even though I have the Interface Designer for the macro setup to "Auto config by name".

 

I also tried removing the Formula tool form the macro and putting it in the wrapper workflow but the "Filename" columns still gets include in the macro output twice.

 

I've also tried to recreate the macro from scratch without any luck in eliminating "Filaname2".

 

Help please.

DanM
Alteryx Community Team
Alteryx Community Team

@JHerzog can you possibly add your macro to the post so that the Community can take a look at it? It is difficult to assist without seeing your configuration of the macro. Sample data would also be helpful.

JHerzog
7 - Meteor

@DanM you can ignore my question about the repeated "FileName" column. I opened the CSV file and this latest CSV file I'm receiving from an external 3rd party they have started to include the "FileName" as a column but in the past they were not. So that solves the "mystery" of the extra "FileName" column in my macro output.

DanM
Alteryx Community Team
Alteryx Community Team

@JHerzog

 

Glad you figured it out.

 

DanM

jusromero
5 - Atom

This is very useful thanks! I just have a question though. I already created a batch macro to import all files. But the problem is thaT, it took me 6-7 hours just to input all the files (1 file is around 80mb) and I still have to add join and union tools after the input.

 

The question is that, is it possible to shorten the time of the batch macro or if not, is there a way to filter the data first based on another table before going to the output of the batch macro?

 

Desired workflow:

1. Input all raw data file (batch macro already created)

2. After the macro, input 2 mapping files which will be used to Join with the consolidated raw data

3. Add Join and Union toosl against the mapping files to get the subset that I need from the data set.

4. Output as an excel file.

 

In this process, #1 is taking too long so I was thinking if there's a way to make this more efficient?