Alteryx Designer Desktop Discussions

Natalia_vf · ‎03-25-2021

I'm using the Alteryx Intelligence Suite. I have 77 pages of cancelled check images and I'm trying to extract information. We need Check Payee, Check Amount, Check Date, and Check Number. The way I got it working was to capture each check separately, and each field within that check separately. The problems I foresee are:

We have 77 pages of this. I couldn’t seem to get it function where I have a template for the first page, and then that template is applied to all the other pages. As a result, I would need to make 77 different PDF’s. That can’t be the most efficient way. Should I use a macro?
The OCR ends up being very wrong anyways, and I would need to manually fix most of it. I appreciate any help!

No Check Payee is captured sometimes.
Check Dates are way off for almost all.
Check amounts and numbers are either blank or complete gibberish.

BenMoss · ‎03-26-2021

Hi @Natalia_vf I don't have any experience with these tools but I do with PDF extraction in Alteryx, in the past I have used this macro created by @OllieClarke in order to batch read PDFS.

The output is the raw text so it then becomes a case of creating a parsing methodology which allows you to extract the information you want (RegEx tool is usually your friend here).

I don't expect you'll be able to share an example, but please take a look at this tool and see if it helps/makes your life easier!

https://gallery.alteryx.com/#!app/PDF-Input/5b685aff0462d710907f7a3b

Ben

cgoodman · ‎03-26-2021

@Natalia_vf

I have posted in the ideas form for this to updated as a feature, so it would be worth adding comments to this post to add to the potential that this becomes a native feature.

In the meantime, the workaround I have found is to add a record ID tool so you still know which document it is, then update the page number using a formula tool. This tricks all the in-bound documents into looking like page 1 which is how the template is set up.

Chris
Check out my collaboration with fellow ACE Joshua Burkhow at AlterTricks.com

Paul-Evans · ‎08-28-2021

Somewhere between versions 2020.2 and 2021.2, this workaround no longer works.

In addition to changing all 'page' value to '1', you will need to modify the 'path' field so that all of those are unique.

trettelap · ‎10-28-2021

This seems to work, but I am little stumped as to why because the path doesn't seem to be referenced. Any insight on what that formula does?

Paul-Evans · ‎10-30-2021

Under expected usage, the tool can have only one value per annotation name per file (e.g. you can't use the same annotation name even if it's on a different page of the template). My assumption is that the result of the extraction are saved back to the original table by using filename and page as key fields, rather than just processing by line. It seems that, in the case of duplicate filename and page combinations, only the last one is retained before being joined back to the original table.

Alteryx Designer Desktop Discussions

Check images extraction using pdf input and image to text tools

Re: Unable to get an output

Re: Extracting the list of sheet names across mult...

Re: Firm names parse

Re: Help with Multi-Row formula

Re: Assign Random data to Executive with limited p...