Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parse protected PDF files?

ulrich_schumann
8 - Asteroid

Hello,

within the Alteryx Knowledge section I read about: ' Can Alteryx Parse A Word Doc Or PDF?' which was very helpful to get started. Now I am facing some more challenging topics:

I have hundreds of contractual relevant documents like Acceptance Sheets and Change Requests where I need to pick relevant commercial data on a regular basis. These documents are stored on our SharePoint in PDF format. Unfortunately the PDF files are protected. So what is working on a manual basis in my test environment:

- print the protected PDF with a PDF printer into a non protected version (in some cases they need to be unlocked with another tool beforehand)

- save the non protected PDF into plain text format

- run the Alteryx workflow to collect the relevant data

My qustions are:

- Does anyone see any chances to automate the entire workflow with alteryx?

- The DOCTOTEXT tool mentioned in the Knowlege section is not working for me from the start. Are there any tools known that I could use for command line usage to automate the entire workflow?

- Any other alteratives to solve the situation? Manual transformation is not an option for us as this is an onging requirement

Any input is highly appreciated.

 

8 REPLIES 8
MarqueeCrew
20 - Arcturus
20 - Arcturus

would the results of "crack protected pdf" as a google search help you?

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
WayneWooldridge
Alteryx Alumni (Retired)

Have you tried using the Sharepoint List Input tool to pull your .pdf files?  Since it requires username/password, does that provided enough information for the protected files that would be able to get non-protected .pdf files?

ulrich_schumann
8 - Asteroid

Unfortunately not, as I already can access the pdfs manually. I am looking for ways of automation.

ulrich_schumann
8 - Asteroid

The Sharepoint List Input tool does not really help. We are working on a solution to replace the protected pdfs in the future. But as these are contractual documents, we cannot change them right away. I was hoping the processing of pdfs would be more often used by others and therefore hoped for some best practices.

WayneWooldridge
Alteryx Alumni (Retired)

Currently Alteryx doesn't have a way to automate the opening of secured .pdf files.  You may want to suggest that as a product enhancement, however (https://community.alteryx.com/t5/Ideas/ct-p/ideas).  There may be resources out there that will help you do this (such as: https://community.alteryx.com/t5/Ideas/ct-p/ideas). 

patrick_mcauliffe
14 - Magnetar
14 - Magnetar

If you have (or can get) Adobe Acrobat Pro, that might be a better starting point.  You can automate operations like the conversion within that application.

You could also combine a large batch of them at once, then convert that file to a spreadsheet or text file and use Alteryx from there.

Troy
8 - Asteroid

I have been using WordCleaner (https://wordcleaner.com/) for converting Word Documents and PDFs into HTML.  It has a text output option as well.  My scenario the HTML worked better as I needed <table>, <td> and <tr> tags as this improves the processing of the files.  Also wordcleaner removes all extra styles from the tags.  Very helpful.  Best I have been able to find for doing this.  Note it does require a purchase $99 or $199 (for command line and other options).

siddhartha_s
7 - Meteor
Here's something which I can see as one of the possible solutions: Make a batch macro which calls a batch file which has a command in it like: qpdf.exe -password="securepassword" --decrypt input.pdf output1.pdf

I'm using free qpdf.exe here. Now before calling the batch file, the alteryx program should rewrite the batch file with the file named passed to it using the directory tool, configured for all .PDF files.
Labels