Extract Text Using Imate Template

Question

A computer vision question.

Background

I have a PDF document.  It is 56 pages in length.  It is not a table.   It is a text document.   Each paragraph of text is assigned an outline #.  The outline number is important.  I need to associate the outline number with the corresponding text, etc.

I have attached 2 sample pages. The sample pages do not contain the header or footer.

Setup

I am using an Image Template tool pointing at page 1.  I select the entire text area of the sample text (the main body text between the header and footer of the real text).  Note:  I have found that using the Image Template tool is the only way to consistently and accurately capture the paragraph outline number.

Problem:

The output of page 1 extracts all required text.  Page 2 and any subsequent pages are null.

SamplePages.pdf

hellyars · Answer

I noticed an interesting behavior when trying to setup a batch macro.

It will not recognize the table if the input file is a PDF.  I had to convert the PDF pages to PNG to get the batch macro to recognize it as a table.

hellyars · Answer

@gabrielvilella  A batch macro makes sense.   I wrongly assumed it was a built in capability for this type of use case -- define the template rinse and repeat.

gabrielvilella · Answer

@hellyars to apply the same 1 page template to a multi page file on every page, you need to create a batch macro. This is something Alteryx should improve.

PDF template macro.yxzp