Hi there! I am new to Alteryx and the Intelligence Suite. I am working on bringing in PDF invoices and parsing them into a useful format. I used the Image Input, Filter, and Image to Text tools to bring in page 1 of the invoice and then I used the Text to Columns tool to get the text into a column for parsing...see workflow I used below.
This is what my data looks like. I need to get the date, unit, and amount due into separate columns. See below highlighted in yellow.
The invoice is formatted like this...see below. The addresses and dates are at the top. The Date, Shift Worked, Temp, Dept., Desc., Rate, Units, and Amount Due are all in one row. I am not sure why it pulled all of these into separate rows when I did Text to Columns at every new line.
I will eventually be pulling in multiple PDF invoices at the same time, but I wanted to start building this process with just one invoice. The invoices are mostly in the same format, but not enough to use the Image Template tool. I tried using it without any luck. It worked for the first page that I annotated but when I tried to input multiple pages it would cut off part of the date or rate. Any help will be greatly appreciated. Thank you so much!!!
I believe you will get better results if you create some masks before, using one invoice as template (Image Template tool). Another way that works - if your file is an actual PDF and not an image - is with the PDF input tool.
https://community.alteryx.com/t5/Public-Community-Gallery/PDF-Input/ta-p/887038
User | Count |
---|---|
60 | |
24 | |
24 | |
21 | |
21 |