This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
A computer vision question.
Background
I have a PDF document. It is 56 pages in length. It is not a table. It is a text document. Each paragraph of text is assigned an outline #. The outline number is important. I need to associate the outline number with the corresponding text, etc.
I have attached 2 sample pages. The sample pages do not contain the header or footer.
Setup
I am using an Image Template tool pointing at page 1. I select the entire text area of the sample text (the main body text between the header and footer of the real text). Note: I have found that using the Image Template tool is the only way to consistently and accurately capture the paragraph outline number.
Problem:
The output of page 1 extracts all required text. Page 2 and any subsequent pages are null.
Hey @hellyars ,
You need to include the template area in the image template tool for all the pages also, you have at the top the page number for you to move between pages.
Maybe there is a solution to do that dynamically, but I wasn't able to find one yet.
Best,
Fernando Vizcaino
I took your comment to mean that since my template had 2 pages it may have been throwing things off. I reduced my template to 1 page and still can't get page 2-N to output.
I hope you are not saying that I need to have a 1 to 1 page to template ratio. That would defeat the purpose of a template especially when all I want to do is take all the text between the head and footer of a page across two pages (or 56 in the original).
Hey @hellyars ,
I hope you are not saying that I need to have a 1 to 1 page to template ratio
Unfortunately, that is what I'm saying, yes.
I have a use case here where I have a 4 page PDF and different zones for each of the pages, so in that case, I need to create a template for each.
It would be great to have an additional option to do exactly what you want, and I think you can post that as an idea here in the community.
I'm thinking here and maybe I have a suggestion for you. Will test it and get back to you in a few.
Best,
Fernando Vizcaino
@hellyars to apply the same 1 page template to a multi page file on every page, you need to create a batch macro. This is something Alteryx should improve.
@gabrielvilella A batch macro makes sense. I wrongly assumed it was a built in capability for this type of use case -- define the template rinse and repeat.
I noticed an interesting behavior when trying to setup a batch macro.
It will not recognize the table if the input file is a PDF. I had to convert the PDF pages to PNG to get the batch macro to recognize it as a table.