When reading a folder containing multiple pdf (invoice) files, and wanting to use specific information that will definitely appear on EACH invoice, if the following appears on the invoice, but not necessarily in the EXACT same location (i.e. same row) horizontally:
Total Charges $25,000.00
Is there a way for Alteryx to be "programmed" to "find" that specific "string", regardless of where it appears on the file/invoice?
Hey there! Great question re; PDFs. First question, do you already have the intelligence suite add-on? How are you reading PDFs now?
Bluebird_Tim,
Thanks for getting in touch. To answer your question, yes, I do have the Intelligence Suite installed, and I do use it to read the pdf files. I've attached a screenshot extract of a section of the pdf (invoice) that I have set up as the "Image Template" in my Alteryx workflow.
Any help, assistance, or advice you can provide would be greatly appreciated.
Got it - So I have a couple thoughts.
Depending on how the PDF has come in, you could try to read the whole thing and then parse it out.
The better technique I think would be to grab a larger part of the PDF and then parse out the Total Cost area. This depends on exactly how things move from PDF to PDF, but essentially determine a "safe" area of capture where you know total cost will show up. I attached a screenshot of what I would try. There isn't a perfect way to track values if they slightly move across PDFs. This is sort of the best way to accommodate that when the information generally stays in the same area of the PDF.
Additionally, if you haven't already, this tool is helpful if there are any issues returning correct characters back: https://help.alteryx.com/20231/designer/image-processing. It will make the image easier to read by the OCR
You might have to experiment, this where the "art" of things come in.
Hope this helps!