I have PDF reports that are 2-N pages in length.
I use the PDF to Text tool to import the text into Alteryx.
Unfortunately, the source PDF is typically a poor quality scan, but I can work with it.
The PDF to Text tool is configured to Read Text Context Only and output as Lines. I found applying Text Recognition in Adobe produces better results than the Read Text and Image Content setting in the Alteryx tool, blah blah blah...
Here is my problem?
I wan to ignore the 1/5 of each page.
I tried using an Image Template Tool connected to the T input anchor.
I created a template using only the first page of the document, and using the Image Template Tool highlighted the lower 4/5 of the page and assigned it a value of Body_Text. But, that did not work. The workflow with the template attached to the PDF to Text tool only processes the first page of the document (not all 14).
A few other details...
While the structure of each report remains the same, the content varies considerably (hence I can only really call the lower 4/5 of each page 'body_text' ).
How
Hi @hellyars
You drew a box on the first page of the template and it returned text for the first page. I agree that there should be an option to return text for every page. I've attempted to create a macro that creates markup to apply to every page that's listed in the "List of Pages" Text Input Tooll. Please see attached and let me know if it works!
I updated the list of pages to match the # in my test document, but the macro drop down is blank.
I must be doing something wrong.
Image template is pointed to my PDF (14 pages).
I used the annotate function to create a field called BODY (that represents the lower 4/5 of the page)
I extended List of Pages to 14.
But, I still cant enter anything into the List of Pages question resulting in the "No valid fields were selected" error.
@hellyars Sorry this is such a faff. Can you try changing the data type of the List of Pages to an integer? I just think the macro doesn't accept bytes as a potential field in the drop down (which I should change).