Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Extracting data from PDF

jasontax
6 - Meteoroid

I am trying to extract data from state tax form pdfs into a table (i.e. sales & tax amounts, etc.).  Each state has a different pdf form and format.  Is there a way of doing this.  It looks like an Image Template can be set up but would a different template need to be set up for each state?  Is there a way to streamline the extraction so I can extract data from all state pdf forms in a folder?

1 REPLY 1
Prometheus
12 - Quasar

If you have the Computer Vision palette, you could use the Image Input tool, which replaced the PDF Input tool. I had this same use case a couple of years ago and I didn't have the Image Input/PDF Input tool. I ended up converting the PDF to html using Python and parsing from there. However, because there was no consistency of format across the 100+ files, I was unable to use them because it would've been more trouble than it was worth. I ended up asking the file producer for .csv or .txt files instead and it made the use case easier. If you can obtain some type of spreadsheet or text file or even pull the data yourself using an API call or ODBC connector, you're better off.

Labels
Top Solution Authors