Hi,
I am using the Read PDF tool from Alteryx Gallery
https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa
This tool can read one pdf file for one run which works great however, I have developed a workflow that parses invoices.
One workflow is designed to parse one customer's invoice and many times one customer has hundreds of invoices for a month and in the same identical format.
I don't want to:
Hence, how can I modify this macro so that it reads all the pdf files from one folder and concats the output to a single dataframe thus making my life a lot easier?
Help would be highly appreciated.
For an example, please find packaged workflow attached and I would be happy to answer any questions promptly.
Thanks.
Solved! Go to Solution.
Try this. It worked for me in terms of reading in all .pdfs in a set directory.
A few notes - generating record ids is a work in progress and could be problematic if different files have multiple component parts which are non-static. IE each record should basically have the same number of parts for easier identification.
If that's not the case let me know as it'll take some playing around in R.
2021.1 you can edit the workflow/macro in notepad to change that. What version are you running? Having said that - I'd try running it in a 2020 version first - I'd be surprised if there were any features which cause you trouble.
How do I downgrade your workflow to be able to work in my version?
I am a bit hesitant to upgrade because the last time I did that, it broke my pdf parsing workflows as I needed the updated R dependency which was (then) not available. I had to roll back.
Yeah I had to upgrade today because of broken R stuff too. I changed the versions in notepad on the .xmd and .ymc - make sure the directories are set up for you and not me and give it a try...
If those don't work - I'll just post a screen grab. It's pretty straightforward.
Surprising, This is new to me!
I am unable to find my macros in the tool palette even after adding it to the appropriate folder and restarting my alteryx.
If you can help me with screenshots to make it work would be great!
However, why would be the case that I am unable to see my macros on my tool palette?
I am looking into my tool palette however the macros folder does not show up?
I usually just insert them via browse - That does seem odd though since it's in your directory... So here's the screen grab and logic.
I turned the pdf tool into the main section of a batch macro. I added a control parameter and an action tool. The action tool updates the
name of the .pdf file.
The outer workflow uses a Directory tool set for *.pdf - it then feeds into the batch macro with filename being fed into the control panel.
Note - I got rid of the text to columns outside of the macro because file identifiers are not generated in the current PDF tools R code. I can take a look at this but not sure what my timeframe would be on editing the underlying R code.
After it returns from the macro, I had a record id and then a text to column /n split. Again this worked on every .pdf in the directory I chose so it should work for you.
Yeah.. I replicated your macro and the macro part works. however I am unable to add it to my workflow because the macro does not show up as an add-able node.
I will take the risk now and upgrade Alteryx to 2021. Will update you on the progress. Fingers crossed
Yep, upgrading the version worked.
However, I have a question:
Is the folder here has to be the same?
I mean, if I have different folder to work this macro on, will I have to update this value and save the macro to make it work?