Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!
The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Image to text - Option to apply template to all pages

When using the text mining tools, I have found that the behaviour of using a template only applies to documents with the same page number.

 

So in my use case I've got a PDF file with 100+ claim statements which are all laid out the same (one page per statement). When setting up the template I used one page to set the annotations, and then input this into the T anchor of the Image to Text tool. Into the D anchor of this tool is my PDF document with 100+ pages. However when examining the output I only get results for page 1.

 

On examining the JSON for the template I can see that there is reference to the template page number:

cgoodman3_0-1604393391514.png

 

And playing around with a generate rows tool and formula to replace the page number with pages 1 - 100 in the JSON doesn't work. I then discovered that if I change the page number on the image input side then I get the desired results. 

 

cgoodman3_1-1604393499357.png

However an improvement to the tool, as I suspect this is a common use case for the image to text tool, is to add an option in the configuration of the image to text tool to apply the same template to all pages.

 

cgoodman3_4-1604393738275.png

 

 

 

 

 

13 Comments
Paul-Evans
9 - Comet

@cgoodman3 & @bensilv - I figured out the update to the workaround.

In addition to changing all 'page' value to '1', you will need to modify the 'path' field so that all of those are unique. 

 

PaulEvans_0-1630150480036.png

The Image Template tool requires all annotations to be unique. If you try to set up a multipage template manually, you can't just have a repeat annotation on the 2nd page, you would need to name it differently. If you edit the template JSON file to duplicate it per page (by just adjusting the page number), that's why it will cause the Image to Text tool to error out. As to why it repeats when setting the 'page' field to 1, it seems that even though each image is processed individually, only the last value is retained and joined back by page, file, and your annotation's field name (which makes sense since it requires unique annotations by file). Since the actual processing takes place on the blob 'image' field, modifying the 'path' field doesn't impact the extraction, but can circumvent the expectation of one annotation value per file. 

CristonS
Alteryx Alumni (Retired)
Status changed to: Under Review

We are investigating the level of effort necessary to make this a reality.

veeliang
Alteryx Alumni (Retired)

Hi folks,

 

Our team worked on this Idea and it's now available for review in Alteryx Public Preview 23.1! You can sign up for Public Preview here: https://community.alteryx.com/t5/Alter-Nation/You-Are-Invited-to-Alteryx-Public-Preview-23-1/ba-p/10...

 

If you already signed up, head to the Downloads center in the Public Preview pilot project to download and set up the Intelligence Suite installer.

 

Please let us know if there's any feedback.

 

Thank you.