How to modify macro to read multiple PDF Files in the same Workflow/Input Data Node?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
I am using the Read PDF tool from Alteryx Gallery
https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa
This tool can read one pdf file for one run which works great however, I have developed a workflow that parses invoices.
One workflow is designed to parse one customer's invoice and many times one customer has hundreds of invoices for a month and in the same identical format.
I don't want to:
- Duplicate the tool (manually) for the times equivalent the number of invoices in a folder
- Run the same workflow for a new invoice every time.
Hence, how can I modify this macro so that it reads all the pdf files from one folder and concats the output to a single dataframe thus making my life a lot easier?
Help would be highly appreciated.
For an example, please find packaged workflow attached and I would be happy to answer any questions promptly.
Thanks.
Solved! Go to Solution.
- Labels:
- Help
- Macros
- Optimization
- Workflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Try this. It worked for me in terms of reading in all .pdfs in a set directory.
A few notes - generating record ids is a work in progress and could be problematic if different files have multiple component parts which are non-static. IE each record should basically have the same number of parts for easier identification.
If that's not the case let me know as it'll take some playing around in R.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
2021.1 you can edit the workflow/macro in notepad to change that. What version are you running? Having said that - I'd try running it in a 2020 version first - I'd be surprised if there were any features which cause you trouble.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
How do I downgrade your workflow to be able to work in my version?
I am a bit hesitant to upgrade because the last time I did that, it broke my pdf parsing workflows as I needed the updated R dependency which was (then) not available. I had to roll back.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Yeah I had to upgrade today because of broken R stuff too. I changed the versions in notepad on the .xmd and .ymc - make sure the directories are set up for you and not me and give it a try...
If those don't work - I'll just post a screen grab. It's pretty straightforward.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Surprising, This is new to me!
I am unable to find my macros in the tool palette even after adding it to the appropriate folder and restarting my alteryx.
If you can help me with screenshots to make it work would be great!
However, why would be the case that I am unable to see my macros on my tool palette?
I am looking into my tool palette however the macros folder does not show up?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I usually just insert them via browse - That does seem odd though since it's in your directory... So here's the screen grab and logic.
I turned the pdf tool into the main section of a batch macro. I added a control parameter and an action tool. The action tool updates the
 
 
name of the .pdf file.
The outer workflow uses a Directory tool set for *.pdf - it then feeds into the batch macro with filename being fed into the control panel.
Note - I got rid of the text to columns outside of the macro because file identifiers are not generated in the current PDF tools R code. I can take a look at this but not sure what my timeframe would be on editing the underlying R code.
After it returns from the macro, I had a record id and then a text to column /n split. Again this worked on every .pdf in the directory I chose so it should work for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Yeah.. I replicated your macro and the macro part works. however I am unable to add it to my workflow because the macro does not show up as an add-able node.
I will take the risk now and upgrade Alteryx to 2021. Will update you on the progress. Fingers crossed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Yep, upgrading the version worked.
However, I have a question:
Is the folder here has to be the same?
I mean, if I have different folder to work this macro on, will I have to update this value and save the macro to make it work?
