Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

PDF page count

pvara
8 - Asteroid

Is anyone aware or come accross a workflow that can count the number of pages in a PDF? is it possible? I am trying to develop a workflow app so that my group can QC production data that contains thousands of pdfs but would also would like to know page counts for reporting purposes.

 

Thank you

10 REPLIES 10
JohnJPS
15 - Aurora

I don't see how to do it with built-in tools, however the answer to basically the same question at StackOverflow (http://stackoverflow.com/questions/14644353/get-the-number-of-pages-in-a-pdf-document), could be integrated into an Alteryx workflow: install the same pdfinfo.exe program (from http://www.foolabs.com/xpdf/download.html), and call it using the "Run Command" tool; then use Alteryx tools to parse the output from pdfinfo.exe.  Hope that helps!

 

pvara
8 - Asteroid

Thank you for your post. So when creating an APP and I call out to the .EXE does the user have to install that .EXE locally? I have will have serveral users running this app.

 

Thank you,

Pete


@JohnJPS wrote:

I don't see how to do it with built-in tools, however the answer to basically the same question at StackOverflow (http://stackoverflow.com/questions/14644353/get-the-number-of-pages-in-a-pdf-document), could be integrated into an Alteryx workflow: install the same pdfinfo.exe program (from http://www.foolabs.com/xpdf/download.html), and call it using the "Run Command" tool; then use Alteryx tools to parse the output from pdfinfo.exe.  Hope that helps!

 


 

JohnJPS
15 - Aurora

I don't think they would need to do a full-on install if you can include pdfinfo.exe with your app.  I tried creating a .yxzp file (attached) that a user should be able to unzip on their machine and run the workflow, since pdfinfo.exe is included in the zip. For this example it reads two PDF files also located in the workflow directory (just a couple Alteryx PDFs), but the actual files could be replaced with whatever you want... just need to obtain the path via whatever means you like.

 

The dirty work of calling pdfinfo.exe here is done in R rather than via a normal "Run Command" tool, but that's just because the config panel for "Run Command" was very constrictive and it doesn't take much for me personally to fall back to R.  Perhaps there are non-R approaches that others could help with too, but if this helps, great!

 

 

JohnJPS
15 - Aurora

I pointed this at a directory with 180 PDFs in it, and it promptly choked since pdfinfo.exe only returns various things it finds, and the "rbind" I was using isn't flexible enough for that.  The attached workflow ran successfull and counted all 22,000+ pages in the 180 PDFs.  It also uses simplifies the R component and does more with Alteryx tools after running R.

 

pvara
8 - Asteroid

Thank you John I was just working thru your last example and was going to test it against my directory of pdfs. Thank you for reposting.

pvara
8 - Asteroid

I did get an error within R: Error in system (syscmd,intern = True):

All I did was point the directory to my directory of PDF was there any other confirguration to do? perhaps in the formula tool?

pvara
8 - Asteroid

@JohnJPS wrote:

I pointed this at a directory with 180 PDFs in it, and it promptly choked since pdfinfo.exe only returns various things it finds, and the "rbind" I was using isn't flexible enough for that.  The attached workflow ran successfull and counted all 22,000+ pages in the 180 PDFs.  It also uses simplifies the R component and does more with Alteryx tools after running R.

 


I did get an error within R: Error in system (syscmd,intern = True):

All I did was point the directory to my directory of PDF was there any other confirguration to do? perhaps in the formula tool?

JohnJPS
15 - Aurora

Yes, in this version, if you check the first Formula tool, it has the path to the pdfinfo.exe in it, which will need to be replaced with a valud path on your system... that ends up in 'syscmd' in the R tool, thus the error.

JohnJPS
15 - Aurora

I can confirm that invoking pdfinfo.exe from a shared network location caused no issues for me.  I turned it into an anlytic app (attached) that runs pdfinfo.exe from a network location and allows you to choose your directory and include/exclude subdirectories.  It puts the PDF and page counts into a message.

 

Caveats:

  • People will need access to the location you place pdfinfo.exe; (and you'll need to update the Formula Tool with that location).
  • Progress indicator is pretty useless: it spends a lot of time chugging through the PDFs and not much progres is displayed despite trying to from R.
  • A user running the app will need to "Show Output Log" and scroll down in order to see the message about PDF and Page counts.

Otherwise, it's certainly easy to execute.

Labels