Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

PDF page count

Alteryx Partner

Is anyone aware or come accross a workflow that can count the number of pages in a PDF? is it possible? I am trying to develop a workflow app so that my group can QC production data that contains thousands of pdfs but would also would like to know page counts for reporting purposes.

 

Thank you

15 - Aurora

I don't see how to do it with built-in tools, however the answer to basically the same question at StackOverflow (http://stackoverflow.com/questions/14644353/get-the-number-of-pages-in-a-pdf-document), could be integrated into an Alteryx workflow: install the same pdfinfo.exe program (from http://www.foolabs.com/xpdf/download.html), and call it using the "Run Command" tool; then use Alteryx tools to parse the output from pdfinfo.exe.  Hope that helps!

 

Alteryx Partner

Thank you for your post. So when creating an APP and I call out to the .EXE does the user have to install that .EXE locally? I have will have serveral users running this app.

 

Thank you,

Pete


@JohnJPS wrote:

I don't see how to do it with built-in tools, however the answer to basically the same question at StackOverflow (http://stackoverflow.com/questions/14644353/get-the-number-of-pages-in-a-pdf-document), could be integrated into an Alteryx workflow: install the same pdfinfo.exe program (from http://www.foolabs.com/xpdf/download.html), and call it using the "Run Command" tool; then use Alteryx tools to parse the output from pdfinfo.exe.  Hope that helps!

 


 

15 - Aurora

I don't think they would need to do a full-on install if you can include pdfinfo.exe with your app.  I tried creating a .yxzp file (attached) that a user should be able to unzip on their machine and run the workflow, since pdfinfo.exe is included in the zip. For this example it reads two PDF files also located in the workflow directory (just a couple Alteryx PDFs), but the actual files could be replaced with whatever you want... just need to obtain the path via whatever means you like.

 

The dirty work of calling pdfinfo.exe here is done in R rather than via a normal "Run Command" tool, but that's just because the config panel for "Run Command" was very constrictive and it doesn't take much for me personally to fall back to R.  Perhaps there are non-R approaches that others could help with too, but if this helps, great!

 

 

15 - Aurora

I pointed this at a directory with 180 PDFs in it, and it promptly choked since pdfinfo.exe only returns various things it finds, and the "rbind" I was using isn't flexible enough for that.  The attached workflow ran successfull and counted all 22,000+ pages in the 180 PDFs.  It also uses simplifies the R component and does more with Alteryx tools after running R.

 

Alteryx Partner

Thank you John I was just working thru your last example and was going to test it against my directory of pdfs. Thank you for reposting.

Alteryx Partner

I did get an error within R: Error in system (syscmd,intern = True):

All I did was point the directory to my directory of PDF was there any other confirguration to do? perhaps in the formula tool?

Alteryx Partner

@JohnJPS wrote:

I pointed this at a directory with 180 PDFs in it, and it promptly choked since pdfinfo.exe only returns various things it finds, and the "rbind" I was using isn't flexible enough for that.  The attached workflow ran successfull and counted all 22,000+ pages in the 180 PDFs.  It also uses simplifies the R component and does more with Alteryx tools after running R.

 


I did get an error within R: Error in system (syscmd,intern = True):

All I did was point the directory to my directory of PDF was there any other confirguration to do? perhaps in the formula tool?

15 - Aurora

Yes, in this version, if you check the first Formula tool, it has the path to the pdfinfo.exe in it, which will need to be replaced with a valud path on your system... that ends up in 'syscmd' in the R tool, thus the error.

15 - Aurora

I can confirm that invoking pdfinfo.exe from a shared network location caused no issues for me.  I turned it into an anlytic app (attached) that runs pdfinfo.exe from a network location and allows you to choose your directory and include/exclude subdirectories.  It puts the PDF and page counts into a message.

 

Caveats:

  • People will need access to the location you place pdfinfo.exe; (and you'll need to update the Formula Tool with that location).
  • Progress indicator is pretty useless: it spends a lot of time chugging through the PDFs and not much progres is displayed despite trying to from R.
  • A user running the app will need to "Show Output Log" and scroll down in order to see the message about PDF and Page counts.

Otherwise, it's certainly easy to execute.

Labels