We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF Image Input

sparsley
5 - Atom

Hi there!  I am new to Alteryx and the Intelligence Suite.  I am working on bringing in PDF invoices and parsing them into a useful format.  I used the Image Input, Filter, and Image to Text tools to bring in page 1 of the invoice and then I used the Text to Columns tool to get the text into a column for parsing...see workflow I used below.

 

sparsley_0-1645474840722.png

This is what my data looks like.  I need to get the date, unit, and amount due into separate columns.  See below highlighted in yellow.  

sparsley_2-1645474992113.pngsparsley_3-1645475015759.png

sparsley_4-1645475048984.pngsparsley_5-1645475075938.png

The invoice is formatted like this...see below.  The addresses and dates are at the top.  The Date, Shift Worked, Temp, Dept., Desc., Rate, Units, and Amount Due are all in one row.  I am not sure why it pulled all of these into separate rows when I did Text to Columns at every new line. 

sparsley_6-1645475885743.png

 

I will eventually be pulling in multiple PDF invoices at the same time, but I wanted to start building this process with just one invoice.  The invoices are mostly in the same format, but not enough to use the Image Template tool.  I tried using it without any luck.  It worked for the first page that I annotated but when I tried to input multiple pages it would cut off part of the date or rate.  Any help will be greatly appreciated.  Thank you so much!!!

1 REPLY 1
gabrielvilella
14 - Magnetar

I believe you will get better results if you create some masks before, using one invoice as template (Image Template tool). Another way that works - if your file is an actual PDF and not an image - is with the PDF input tool.

https://community.alteryx.com/t5/Public-Community-Gallery/PDF-Input/ta-p/887038 

Labels
Top Solution Authors