It’s about time I release my “Input Batch” macro, so here we go......
For those of you familiar with the Dynamic Input, you will know how much of a life saver it can be, but from time to time, you may be frustrated by the need for the file formats to be the same. Both the type of file (Excel, CSV, TXT etc) and also schema (field types, size and number/name of columns).
This can often cause a few issues, as Alteryx reads in the template and will then use that as the type of file, and then the first file it reads in dictates the schema.
Well, if you want the ability to read in many different types and many different schemas at once, you are in luck!
I have built a macro (with a sub-macro) that will be able to read in: ".yxdb",".cydb",".sz",".avro",".csv",".dbf",".mdb",".xls",".xlsx",".xlsm",".sas7bdat",".sav",".sqlite",".xml”
The input required is the full path with extension. This enables you to use the directory tool, apply any logic to filter the list (AKA get todays files) and then input that in.
After that it will go through all the files and read them in. Then using the power of Union it will align all columns named the same, any extra columns get added to the end.
There is also some extra stuff when it comes to Excel, it will actually loop through all the sheets and read them in one by one.
Hopefully some of you will find this useful.
If you provide it file types that it is not able to read, it will through a warning.
Feel free to add comments about future enhancements and I’ll look to see if I can add them in.
Attached is a zip file (.7z), with the installer (.yxi) and some sample data with a sample workflow.
Thank you so much for this macro. Quick questions
Thanks for the assistance.
@bchen1 wrote:Thank you so much for this macro. Quick questions
- Is it possible to configure it so I can output for Rows 10 and beyond
- How would I output just the filename? Right now when I batch the files, the full path is being pulled and I just want the filename
Thanks for the assistance.
You're welcome, I am glad you are finding it useful.
To update it to only output from 10 rows and beyond, you'd need to open up the macro and modify the input tools within. I'd then recommend saving this as your bespoke version as my guess is it won't always be from row 10 in all uses?
For the second part you could modify the input tool as well, but there is a nice easy function in Alteryx to extract the file name from a file path:
FileGetFileName([FileName])
That should then give you what you need without changing the macro itself.
Thanks
Joe
Hi @JoeS
Thanks for sharing this macro. You don't know how many times this saved me.
I just wanted to share some potential bugs which i couldn't resolve:
1) unable to work on xlsm workbooks
2) if worksheet name has a space at the end, then it throws out an error
I'd wish i know how to fix it, but just putting it here for tracking or once i find time to "attempt"
to fix it!
Edit:
Replying to myself - i found a solution for the above problems.
i'm not sure if this will fix xls or xlsb problem, but the sheet name having a trailing space was a problem for me.
Hadn't had time to incorporate this back into your original workflow, but below formula should fix the trailing space problem.
It's currently reading the Sheet Names as-is without the "$", which might cause problem if there is a trailing space like my case.
(Somehow it trimmed it and couldn't find that particular worksheet)
Formula:
</quote>
left([FileName], FindString([FileName],"|||")+3)
+'`'+
[Sheet Names]
+'$`'
</unquote>
One huge issue with this macro is that in contains another macro with the same name and if I want to save the workflow to a gallery, it will not find the sub-macro.
And here there are two different issues:
The macro is great, but why would you name the sub-macro with the same name as the main macro? :((
@alinmihalcea wrote:One huge issue with this macro is that in contains another macro with the same name and if I want to save the workflow to a gallery, it will not find the sub-macro.
And here there are two different issues:
- If I add the sub-macro as additional asset with the same name it doesn't work. Since it has the same name, it will look for InputBatch.subyxmc. I can't save a file with this extension.
- If I change the name of the sub-macro and add it to the main macro, something goes wrong and the [Sheet Names] field becomes unknown.
The macro is great, but why would you name the sub-macro with the same name as the main macro? :((
That's a great point to raise, I wasn't aware this wouldn't work on Server and is something for me to test.
The is no real reason why I gave the submacro the same name, as it was almost irrelevant what it was called.
The main importance I focused on, is that the extension was not .yxmc and I went with .subyxmc so that it then doesn't appear in the tool palette as a macro in it's own right, as it's not intended to be used as one.
This could then cause an issue when it comes to server, as I wouldn't want the submacro to appear in the tool palette, as this would confuse users, but if server won't pick it up, then I may be stuck - I'll have a play :)
@JoeS I came across the post as I was having issues with importing files as data type were different on some files. The Dynamic input tool was throwing errors on those few files. This tool works great, but is there a way to specify which line to start pulling data? The problem with my sheets is everything starts on line 2. Line 1 happens to be a blank row so its bringing the headers into the table. I know I could filter them out, but the Fields are all set to V_String now because of this instead of the the data type associated with the actual values in each field. If it started on Line 2 the data types would all be correct.
@cjaneczko - Sorry I missed this reply until now.
Unfortunately the problem that you mentioned below isn't as simple as it first appears to solve.
I did look a while back into adding a "Import from Line:" Numeric up/down. Allowing people to specify to start from Row 2 for example.
However, this option isn't available for all file formats and thus would start to add dependancies one way or another, that or take hours to update to stay dynamic.
What I'd recommend is using the filter to remove the extra headers and then an auto-field tool after to assign the best field type for each column.
Thanks for the follow-up. I did manage to find a spot in one of the nested macros where I could alter the row level that the import starts at. I was able to import a whole directory of files properly after altering the "Start Data Import on Line" in the Example.txt input data tool within the InputBatch.subyxmc.
@cjaneczko wrote:Thanks for the follow-up. I did manage to find a spot in one of the nested macros where I could alter the row level that the import starts at. I was able to import a whole directory of files properly after altering the "Start Data Import on Line" in the Example.txt input data tool within the InputBatch.subyxmc.
Yeah, a hardcoded change for your use case is definitely the quickest way - sorry I can't add it as a feature to the macro.
I'll see if I can get some time and have a think if there is something else I can do.