Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

CS Macro Dev: Reading in Multiple Files with Different Field Schema - Batch Macro

JordanB
Alteryx
Alteryx
Created

Reading in multiple files with different field schemas

This article is part oftheCSMacroDevelopment Series.Thegoal of this series is to communicate tips, tricks, andthethought process that goes into developing good, dynamicmacros.

The Directory tool can be very powerful when reading in multiple files from a folder, either on your local drive, or network location. Once you have the list of file paths you can use the dynamictool or create a batch macro to read all of thesefiles in.

The Dynamic Input tool can be used if all your files have the same field schema.

However, if the field schema's differyou will receive the warnings below when using the Dynamic Input tool to read in the full file paths.

As a result, you can create a simple batch macro which can be used repeatably to bring in multiple files at one time.

Reading Field schema 1.png

Inside the Macro -The first step is to create your workflow.

1)Bring in an input tool and select one of the files you wish to read in. In this example I use .xlsx files, but you can choose whatever file format youwish to read in, however, it will need to be the same file format for all files you are reading in.

2) Add a control parameter above your input tool. This allows you to pass one file path at a time from outside your macro. There is no configuration needed on this tool.

3) Drag from the 'Q' of the control parameter down into the lightning bolt of the input tool. This will add in an action tool. In the Action tool highlight the file path in the configuration window.

This will now add it into the 'replace a specific string' option at the bottom of the configuration window. All this means is that the value you pass through your control parameter will replace this in the input tool.

4) Add a macro output to the Input tool. This will allow the data to flow out of the macro back into your original workflow.

2018-11-14_12-54-26.png

Now got to View>>Interface Designer (ID) and click on thecog icon on the left hand side of the ID window.

2018-11-14_12-56-42.png

In the output mode change the option to:

- Auto configure by Name (Wait until all iterations Run) -Auto Configure by Name will union the fields with the same names

- Auto Configure by Position (Wait Until All Iterations Run) - Auto configure by position will union field 1 etc. in the same position for every file read in.

2018-11-14_12-57-55.png

This will now remove the warnings you were getting with the dynamic input.

Inserting the macro into your workflow

1) File>>Save As - Save the macro in a location where you can access it. Alteryx will know this is a macro workflow because you have added a control parameter, as well as a macro output.

2) In a new canvas you can then right click on the canvas>>insert>>>Macro).

3) If you are reading in a list of files you can then use the directory tool to access the folder with the files in it.

4) As this is an xlsx example I will need to add the sheet name for each of the files. If you are reading in a different file type you will not have to do this step.

5) You can then configure the control parameter and select 'Full path'.

2018-11-14_13-02-01.png

Attached is an example workflow (Version 11.0) which you can use as a template to build off.

If you would like to build a batch macro there is an excellent video on our demand training web page: http://www.alteryx.com/on-demand-training

*Although this macro has been tested it may not work in all scenarios. Please comment on the thread and I would be happy to assist.

Best,

Jordan Barker

Client ServiceRepresentative

Attachments
Comments
JordanB
Alteryx
Alteryx

HI @jusromero

 

I am not sure on the frequency of the input files you receive, however would the following work?

 

Could you just process the latest file (or files you haven't processed yet) and then use the append option in the output tool to the excel file?

 

This way the workflow will not take 6/7 hours each time, but just the first time (It seems you have already done this as well)?

 

Best,

 

Jordan

jusromero
5 - Atom

Hi @JordanB,

 

Thanks so much for the reply! That's an option. But the problem is, the criteria will also change every run so I was thinking if there's a way to get straight to running the criteria workflow and leave the workflows for the batch input not running. Thanks!

 

Regards,

Justin

JordanB
Alteryx
Alteryx

Hi @jusromero

 

How does the criteria change each run? Does this mean the business logic in the workflow also changes? 

 

Is the criteria influenced by whats in the file? If so, does the file name give anything away about the business logic?

 

Best,

 

Jordan

jusromero
5 - Atom

Hi Jordan,

 

I have another mapping table (which I use the join against the batch input) which changes every run. I was thinking of putting the workflow of the batch input into a container but when I join it in my mapping table, I am receiving an error of "Both Left and Right must have connections." I was thinking if I missed something of doing an output or anything.

JordanB
Alteryx
Alteryx

Hi @jusromero

 

I assume the error is because the name you are joining on is changing with the new mapping file. Is this correct?

 

Would you be able to mock up  a few files which I could use to look at a solution?

 

Best,

 

Jordan

Namrathabarker
5 - Atom

Hi @JordanB,

 

I am new to alteryx. I have same requirement where I have multiple xls files with different schema and output is multiple csv files. I used this macro, but the macro is running for multiple iteration. For example if I have 4 xls files, macro is running for 4 iterations so the output rows are duplicating. Also I am getting warning of different schema and different number of fields than 1st file and file will be skipped.  

 How can I configure macro so that it should run for 1 iteration??

And also suggest about different schema and different fields.

I am using batch macro just as above.

 

DStumpf
5 - Atom

Hi all. I have been trying to use a batch macro to compile CSV files with the same schema. I created a macro and workflow which mirrors the example in the first post exactly. However, once I run the workflow it only compiles the file I list in the macro 30 times one on top of each other instead of compiling all the files from the Directory tool I direct it to. I have pouring through the posts but can't find a fix anywhere. 

JordanB
Alteryx
Alteryx

@Namrathabarker

 

You will have to update the macro in this workflow as it is configured for .xlsx not for .xls

 

To do so, please open the macro (Blue tool icon) by right clicking>>open macro

 

Inside, update the input tool to your .xls input file and select the sheet you want

 

Once you have done this, click on the action tool above the input tool. In here you need to make sure that  your file path is selected and lands in the 'replace a specific string box'

 

If the 'replace a specific string box' is already checked, uncheck the box, click on your file path and then check the box. You should now see your file path in the box.

 

Once your file path is in this box you can save the macro by file>>save As>> give macro a name

 

Back in the main workflow (where the directory and formula tool are), you can now insert the macro by right clicking>>Insert macro

 

Attach the macro to the formula tool and in the dropdown in the interface select 'full path'

 

Before running, make sure you have updated the formula tool if you sheets are not named 'Sheet1'

 

Best,

 

Jordan

 

 

JordanB
Alteryx
Alteryx

@DStumpf

 

Please follow my suggestion above about ensuring the input file inside the macro is updated for a csv. & you have updated the action tool

 

Best,

 

Jordan

amberdimp1
6 - Meteoroid

Hello!  Is there a way to compile the multiple spreadsheets using the headers?  Each time I run my macro, I now have headers as F1, F2, F3, etc.

JordanB
Alteryx
Alteryx

Hi @amberdimp1

 

Inside of the batch macro you can develop logic to rename your field headers which will then be applied to each file.

 

Within the Input tool inside of the batch macro you can use the start Data Import on line:

 

Capture.PNG

 

Another option is to use filter or sampling logic in combination with the dynamic rename to get the correct header format

 

You can then save the macro and repeat that process for each file.

 

Best,

 

Jordan

akshatoz
7 - Meteor

You are a rockstar!!

CarlyArk
5 - Atom

Is there a way to maintain the file name as a column in my main workflow? When I use the macro to input the files I am losing the file name field. 

fmb
6 - Meteoroid

Dear Community:

 

I applied this macro and workflow to a directory with some .csv-files. However, if I run the workflow I still get the message that several files differ in the number of fields compared to the first file and those files are skipped. As the macro should circumvent this, I wonder what is going wrong?

 

 

Best regards,

Florian

 

----

After I've re-installed the macro and the workflow and configurated both again, it works now.

 

JordanB
Alteryx
Alteryx

Hi @fmb

 

Can you confirm this is working after you re-installed. Happy to help if not!

 

Please make sure you updated the macro (configure input tool to .csv format) and re-saved the macro. You can insert the macro into your master workflow with the directory tool

 

Best,

 

Jordan

ckirgiz
7 - Meteor

This is awesome! It worked really well for me. Thank you so much!

Ekta
8 - Asteroid

Dear frnds,
I am new to alteryx , much help will be appreciated.
Problem : I am attaching sample of my input files .. I have 100 Plus such files ..
format remain same for each file but headers differs. We can skip first 5 rows and make the 6 th line as header .
Desired output : I am attaching the sample output .
Since the header in all the files remains between jan to Dec , I would like to combine all the files into one by populating the columns for respective headers .
I have 100 Plus files so I would like to automatically read the files from the folder .
Please help me to provide the solution for this .
Thank a lotInput 1Input 1Input 2Input 2Desired OutputDesired Output

kgordish
8 - Asteroid

@JordanBNeed to study this. Just hit with a schema error during a batch import using dynamic input.

 

Is Alteryx working on a tool that would automatically fix this issue? 

JordanB
Alteryx
Alteryx

@kgordish 

 

The dynamic input is set to error if the files do not have the same field schema. This article describes how to build your own equivalent of the dynamic input which makes it dynamic and not error. Please refer to the above article and let me know if your error persists.

 

Best,

 

Jordan

kgordish
8 - Asteroid

@JordanB 

Hello Jordan. I had the directory tool combined with dynamic input in my model working just fine until discovered the schema error.

Connected parameter to the Directory. Or should it be connected to Dynamic Input? Then added the macro output.

Can you take a look at my workflow. By adding the macro, it has changed the app interface and I can no longer launch the app.


Appreciate the assistance. Amazing how small items can pose large road blocks. Good thing is I retain newly learned tasks.

kgordish
8 - Asteroid

 

 

 

@JordanB OK. Finally managed to create and insert macro. Now getting an error in Dynamic Input. Previously did not have a Dynamic Input Error. Any recommendations?

 

 

DI Error.JPG

kgordish
8 - Asteroid

Finally figured out my errors. No further assistance needed.

EmmaPeng
7 - Meteor

Capture1.PNG18 files in total but only few get processed the rest skipped18 files in total but only few get processed the rest skippedCapture3.PNG

 

hi Jordon

 

Thanks for your post! It is very helpful and I built one WF similar to yours to read multiple excel files with difference schema.

 

I got no errors message but warnings that some files are being skipped. I don't understand why as I have set the setting like yours to deal with different schema.

 

Would be appreciated if you can help me out on this one.

 

Regards

Emma

 

 

 

 

 

JordanB
Alteryx
Alteryx

@EmmaPeng within the interface designer and final option on the left hand side in the window that pops out have you selected 'Auto configure by name'? 

 

The image is at the later stages of my article above

 

Best,

 

Jordan

EmmaPeng
7 - Meteor

@JordanB yes i have tried select this option as well as configure by position, both options will only process part of the files not all.

JordanB
Alteryx
Alteryx

@em 

 

can you send a picture of your action tool config please

 

Best,

 

Jordan

EmmaPeng
7 - Meteor

Capture1.PNGCapture2.PNG

 

@JordanB please see my configuration.

thanks

emeb1
7 - Meteor

I'm using two instances of a batch union macro due to different needs based on inputs. In some runs, there is nothing to open for one of the batch uploaders.  Is there a way to prevent that from erroring out?  It's a predictable and acceptable outcome. Any ideas? 

Crete
7 - Meteor

Dear Jordan,

 

thank you very much for this article. I had an issue with

"Reading in Multiple Files with Different Field Schema"

 

Your article was very helpful to build a batch macro and then integrate it into my original workflow.

 

 

nagakic
5 - Atom

Hello Jordan, amazing Macro you have here.

 

Currently, I am trying to convert a directory of reports through a dynamic input to then upload to Tableau. The issue with it is I tried the macro and the error "A record was created with no fields."

 

I have the sheet name attached to the file and the batch hasn't run any errors, however they are .xls files so I am not sure if that has anything to do with it. The formula is [FullPath] + '|||DailyUtilizationReport$' so that it has the full file path name as well. Also, does the reference file need to come from the same exact directory in order for it to work? 

Alteryx Issue.PNG

JordanB
Alteryx
Alteryx

@nagakic 

 

What data comes out of the third blue tool you have in the workflow above? This should actually be the data from your files.

 

If you don't see any data that's because the macro is made for xlsx. To adjust please right click on the blue macro>open macro. Update the green input tool for one of your .xls files and then update the action tool to reflect your file path in the replace a specific string at the bottom of the configuration window when selected on the action tool. Then go file>save and this will update the workflow above to read the .xls file.

 

Alternatively, check out @CameronS article where his macro will do this automatically for you.

 

Best,

 

Jordan

This is awesome and exactly what I was looking for.  Thanks so much!

Janet
6 - Meteoroid
How do you make it so that it reads the same 'Start Data to Import on Line' 3?
JordanB
Alteryx
Alteryx

@Janet 

 

Inside the macro and in the input tool configuration you can select to read data from a certain line in the .xlsx format. 

 

You can then save the macro and this will read data from line 3 every time

 

Best,

 

Jordan

jt_edin
8 - Asteroid

Fantastic example, this has helped me a lot. I've used batch macros before, but I don't understand what type of macro this is. Normally there's a dropdown box that lets you select standard, batch, iterative etc. But in this case it's missing (it should appear in the red circle):

 

macrotype.JPG

 

Does anyone know what kind of macro Jordan's one is? I've adapted it to batch input a bunch of shapefiles which, although they had identical field names, had different field lengths so it was throwing a schema error.

 

If I could understand what kind of macro this is, and where the unioning of input files takes place (in the calling workflow, or in the macro?) that would help a lot.

SandeepRatakonda
5 - Atom

Hi JordanB

 

@JordanB

 

Can we use this macro for google sheets too by changing the input tool to google sheets input tool. I have multiple google sheets with names like Sprint 1 Deployment tracker,Sprint 2 Deployment tracker. for now i have a workflow which is working but i am inputting each sheet separately with a different node. instead i want to use batch macro to do this dynamically. is this possible, as each sheet will have a different workbook id and worksheet id. Once i read the sheets from my google drive i need merge them and add to a new google sheet.

Alteryx workflow.png

JordanB
Alteryx
Alteryx

@SandeepRatakonda yes this is definitely possible. Just replace the input tool for the GS connector. In the action tool use the '+' buttons to find the name of the sheet and then update that in the 'update a specific string'.

 

Best,

 

Jordan

kspence703
5 - Atom

Thanks for this Jordan!

 

I have a somewhat similar situation but am quite new to Altryex and trying to sort it out and think this is a good start but think a few pieces may be missing. I have multiple files. The files have different schema and varying column names. I am trying to combine all the files together though and wondering with varying schema and varying column names how I would go about doing this? I do not want to get rid of any columns just want to consolidate all columns with same column titles but again they all have a different Schema

jt_edin
8 - Asteroid

@kspence703 you might find this useful, I asked the same question a few years ago:

 

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Struggling-to-use-batch-macro-to-input...

 

As you can see it took a little bit of jiggling before I got it right. I adapted the macro on the page above for csv files instead of xlsx. I attach the macro I used, along with a screenshot of how to set it up. Good luck!

 

Correction - I can't find the button to attach the workflow or macro. I can only attach photos or videos to this message. Not sure what's gone wrong here?!?

Cen64
7 - Meteor

Hi

 

Is there anyway to use this Macro where the xls files have different Sheet names?

 

Essentially I have a folder of approx 230 xls files where they have different schema, no of sheets and different sheet names

 

A number share the same schema

 

So what I would like to do is read them all into one file eventually which has the same 'default' schema

 

I have a separate file that has then file names and a code applied to identify a similar group of files with the same schema that I  might be able to use in a workflow once I have read in the files

 

So will your batch process work if amended to allow variant worksheet names?

 

Thank you

 

 

 

jt_edin
8 - Asteroid

I'm sure this can be done, because the input tool gives you the option to give you a list of sheet names instead of the data. Perhaps it would be better to post on the forum (this isn't actually the forum) and attach a couple of anonymised example files showing the state your tabs and headers are in. Someone is bound to pick it up there.

timothyyeo
7 - Meteor

Hi @JordanB ,

 

First off, thanks for sharing this macro. However, I'm facing an issue when I try to use it.  I tested it by reading a directory with 3 text files, and what this macro is doing for me is that it is duplicating the output by reading and outputting the 3 files 3 times each, meaning the output rows are being duplicated 2 times more. To illusttrate, the output looks something like the  below. Do you know how to resolve this so that each file in the directory is only read once?

 

FilenameRecord
file1A
file2B
file3C
file1A
file2B
file3C
file1A
file2B
file3C
JordanB
Alteryx
Alteryx

Hi @timothyyeo,

 

Have you updated the action tool inside of macro to reflect one of your txt file inputs? 

 

Please make sure only three files names are passed into the macro.

 

Best,

 

Jordan

LFLee
8 - Asteroid

Hi @JordanB , is there any way for the dynamic input tool to pick up file names for file types with .pdf?

mvelders
5 - Atom

A problem I'm having with this is that the connection after the macro is starting from the first row, which in my Excel files are descriptive and don't have the actual header names. In fact, in my case, I have some data in rows 4-7 column 2, then 4-5 column 5 and the rest of the data - the main dataset - is from 13-80+ with the headers being on row 13 for that data and the headers for the horizontal data in rows 4-7 being in columns 1 and 4. I've figured out a way to get around that first issue of the horizontal data by using the Append Fields tool and using Create New Column with Filter and Select tools to merge the data from those horizontal fields with the rest coming in via a second input of the same file, but when using this macro, there's no apparent way to start the import on a certain row or specify that the first row contains data, so I'm stuck with what Alteryx picks up as the headers.

DanM
Alteryx Community Team
Alteryx Community Team

@mvelders

 

This macro contains and Input tool where you can specify which row to start the headers on. If you open the macro (right click: open macro) you can adjust the Input tool.

 

DanM

Rachael_OS
8 - Asteroid

Hi Jordan,

 

Thanks a mill for this - it's really helpful. I've got a batch macro configured the same way as your example, but when I run it for some reason it starts creating a lot of null values in the first sheet it's bringing in, and doesn't output the other files in the output at all, so I end up with mostly null values in the output. Do you have any idea why this would be happening and how to fix?

Cbennett022
8 - Asteroid

Is there a way to disable the in-browse default in the batch macro? I only see the first row of data from the first file when inserting the macro into my standard workflow. Once I add a browse, I can then sequence tools after the macro to perform additional analysis, but I'd rather not include the browse tool. Is this possible?

 

Cbennett022_0-1579636467205.png

sonseeahray
8 - Asteroid

@JordanB Is there a way to incorporate "List of Sheet Names" into this macro? I have files where the tabs are named according to the date they are created. If I can produce a list based on the range of dates these files were created, would I be able to use it in the macro?

pitmansm
8 - Asteroid

Is there a more simple way to do this? this increases my compute time by 10x doing this. I have 2000 files with random schemas but have common columns. tableau seems to handle this a lot easier as in there's no setup just tell it pull in all files in a folder and it matches them up. Can I run SQL on that folder or something?