Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to modify macro to read multiple PDF Files in the same Workflow/Input Data Node?

HW1
9 - Comet

Hi,

 

I am using the Read PDF tool from Alteryx Gallery 

 

https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa

 

This tool can read one pdf file for one run which works great however, I have developed a workflow that parses invoices.

One workflow is designed to parse one customer's invoice and many times one customer has hundreds of invoices for a month and in the same identical format.

 

I don't want to:

  1. Duplicate the tool (manually) for the times equivalent the number of invoices in a folder
  2. Run the same workflow for a new invoice every time.

Hence, how can I modify this macro so that it reads all the pdf files from one folder and concats the output to a single dataframe thus making my life a lot easier?

Help would be highly appreciated.

 

For an example, please find packaged workflow attached and I would be happy to answer any questions promptly.

 

Thanks.

11 REPLIES 11
apathetichell
18 - Pollux

Try this. It worked for me in terms of reading in all .pdfs in a set directory.

 

A few notes - generating record ids is a work in progress and could be problematic if different files have multiple component parts which are non-static. IE each record should basically have the same number of parts for easier identification.

 

If that's not the case let me know as it'll take some playing around in R.

HW1
9 - Comet

@apathetichell  What version of Alteryx are you using?

HW1_0-1620618829037.png

 

I am getting this error

apathetichell
18 - Pollux

2021.1 you can edit the workflow/macro in notepad to change that. What version are you running? Having said that - I'd try running it in a 2020 version first - I'd be surprised if there were any features which cause you trouble.

HW1
9 - Comet

HW1_0-1620619152409.png

How do I downgrade your workflow to be able to work in my version?

 

I am a bit hesitant to upgrade because the last time I did that, it broke my pdf parsing workflows as I needed the updated R dependency which was (then) not available. I had to roll back.

apathetichell
18 - Pollux

Yeah I had to upgrade today because of broken R stuff too. I changed the versions in notepad on the .xmd and .ymc - make sure the directories are set up for you and not me and give it a try...

 

If those don't work - I'll just post a screen grab. It's pretty straightforward.

HW1
9 - Comet

Surprising, This is new to me!

 

HW1_0-1620621389142.png

 

 

I am unable to find my macros in the tool palette even after adding it to the appropriate folder and restarting my alteryx.

 

If you can help me with screenshots to make it work would be great!

 

However, why would be the case that I am unable to see my macros on my tool palette?

 

HW1_1-1620621585255.png

 

I am looking into my tool palette however the macros folder does not show up?

apathetichell
18 - Pollux

I usually just insert them via browse - That does seem odd though since it's in your directory... So here's the screen grab and logic.

 

I turned the pdf tool into the main section of a batch macro.  I added a control parameter and an action tool. The action tool updates the

2021-05-09.png

2021-05-09 (1).png

name of the .pdf file.

 

The outer workflow uses a Directory tool set for *.pdf - it then feeds into the batch macro with filename being fed into the control panel.

 

Note - I got rid of the text to columns outside of the macro because file identifiers are not generated in the current PDF tools R code. I can take a look at this but not sure what my timeframe would be on editing the underlying R code.

 

After it returns from the macro, I had a record id and then a text to column /n split. Again this worked on every .pdf in the directory I chose so it should work for you.

HW1
9 - Comet

Yeah.. I replicated your macro and the macro part works. however I am unable to add it to my workflow because the macro does not show up as an add-able node.

I will take the risk now and upgrade Alteryx to 2021. Will update you on the progress. Fingers crossed

HW1
9 - Comet

Yep, upgrading the version worked.

 

However, I have a question:

 

Is the folder here has to be the same?

 

HW1_0-1620625253053.png 

HW1_2-1620625309003.png

 

 

I mean, if I have different folder to work this macro on, will I have to update this value and save the macro to make it work?

 

HW1_1-1620625289167.png

 

Labels