Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Extract Text Using Imate Template

hellyars
13 - Pulsar

A computer vision question.

 

Background

I have a PDF document.  It is 56 pages in length.  It is not a table.   It is a text document.   Each paragraph of text is assigned an outline #.  The outline number is important.  I need to associate the outline number with the corresponding text, etc.

I have attached 2 sample pages. The sample pages do not contain the header or footer.

 

Setup

I am using an Image Template tool pointing at page 1.  I select the entire text area of the sample text (the main body text between the header and footer of the real text).  Note:  I have found that using the Image Template tool is the only way to consistently and accurately capture the paragraph outline number.

 

Problem:

The output of page 1 extracts all required text.  Page 2 and any subsequent pages are null. 

 

 

 

InputDoc_workflow.png

 

6 REPLIES 6
fmvizcaino
17 - Castor
17 - Castor

Hey @hellyars ,

 

You need to include the template area in the image template tool for all the pages also, you have at the top the page number for you to move between pages.

Maybe there is a solution to do that dynamically, but I wasn't able to find one yet.

 

 

Best,

Fernando Vizcaino

hellyars
13 - Pulsar

@fmvizcaino 

 

I took your comment to mean that since my template had 2 pages it may have been throwing things off.  I reduced my template to 1 page and still can't get page 2-N to output.

 

  • I created a copy of my 2 page sample, deleted 1 page, and used the remaining 1 page as my image template
  • Highlighted the entire text area of the template page and called the field Text
  • That did nothing, the output of page 2 is still null

I hope you are not saying that I need to have a  1 to 1 page to template ratio.  That would defeat the purpose of a template especially when all I want to do is take all the text between the head and footer of a page across two pages (or 56 in the original).

fmvizcaino
17 - Castor
17 - Castor

Hey @hellyars ,

 

I hope you are not saying that I need to have a  1 to 1 page to template ratio

Unfortunately, that is what I'm saying, yes. 

 

I have a use case here where I have a 4 page PDF and different zones for each of the pages, so in that case, I need to create a template for each. 

It would be great to have an additional option to do exactly what you want, and I think you can post that as an idea here in the community.

 

I'm thinking here and maybe I have a suggestion for you. Will test it and get back to you in a few.

 

Best,

Fernando Vizcaino

gabrielvilella
14 - Magnetar

@hellyars to apply the same 1 page template to a multi page file on every page, you need to create a batch macro. This is something Alteryx should improve. 

hellyars
13 - Pulsar

@gabrielvilella  A batch macro makes sense.   I wrongly assumed it was a built in capability for this type of use case -- define the template rinse and repeat.

hellyars
13 - Pulsar

I noticed an interesting behavior when trying to setup a batch macro.

It will not recognize the table if the input file is a PDF.  I had to convert the PDF pages to PNG to get the batch macro to recognize it as a table.

Labels