Loop through files in a directory, select specific columns and then append

I am currently pulling in files using a directory tool with a wildcard on the file name and then feeding that to a dynamic input tool. An issue that has come up is that the number of overall columns can vary from each xlsx, however the main columns that I want will always exist- those other columns can be dropped.

Using the current method, starting at the top of the files in the folder, they will all be brought in unless the schema doesnt match the first file that was read in.

I am thinking i need to be able to loop through each file, select the main columns that i know exist in each from that specific sheet, then append them as those columns are extracted.

Any ideas on what I might need to do to get this working? The above was just my thoughts on what i think i need to do, please feel free to recommend other options.

Thanks for the help!

Datasets

Developer

Developer Tools

Parse

Help

Dynamic Processing

Text Mining

Accepted answers

Luke_C

Hi @JMart2135

See attached. I put the steps you use to clean/select the columns into the batch macro. The control parameter is based on the full path input from the directory tool. It dynamically updates the input tool for each file in the directory location. Hope this helps.

New Workflow8.yxzp

All comments

Luke_C

Hi @JMart2135

If you turn this into a batch macro you could definitely use a select tool to curate the fields, otherwise in the interface designer of the batch macro there are a few options that can handle this scenario. I personally find myself using the 2nd option frequently, which acts similar to a union tool.

Let me know if you need help with how to make a batch macro

mceleavey

Hi @JMart2135 ,

I've built some tools to do exactly what you want.

I've attached them both long with an example workflow.

The first tool reads in the sheet names of all files in the directory or sub-directory, and then appends them to the filename, so you can filter down the sheets you want in case the humans change them over time. The second tool then dynamically reads them all in and gets around the error you will receive in the Dynamic Input tool if the schema differs. You then simply need to select the columns you want using a select tool.

Hope this helps.

Multi Sheet Import Example.yxmd

Dynamic Multi File Sheet Names.yxmc

Import Multiple Schema Flat Files.yxmc

JMart2135

@Luke_C attached are the files we discussed

DummyData2.xlsx

DummyData1.xlsx

Batch Macro Initial Workflow.yxwz

Quick Links

This months top contributors

atcodedog05 19458

Qiu 15866

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703