Image Input Tool errors

Question

All

I am using the image input tool to read the text from pages of a PDF file, and then reg-ex to locate employee numbers on the pages

However some of the pages are not reading fully so when the data is turned to text, the employee numbers are missing (see nulls below)

Ive examined the full process and the problem seems to be at the image to text phase - the Employee numbers are present on all pdf pages, however some are not making it to the text output.

I think possibly the image to text step is not taking in the full page, maybe due to its dimensions

Is anyone familiar with this problem? Is there a way to set the size of the image that the Image to Text step pulls in?

Thanks in advance

Dave

Dave · Answer

Yes it definitely isnt the reg ex - it is 100% the image read to text - it is missing the employee number on 3 PDFs. Those PDFs 100% have an employee number on them so either the image read is not working correctly or the employee number on the pdfs is outside the bounds on the read

Is there any way to test this?

Asnt7 · Answer

That result contains PII, so I can't post it, but I'm quite confident it doesn't have anything to do with the Reg Ex.There is no employee number in 3 pdfs that have been converted to images, and I have checked this thoroughly.

Dave · Answer

@ArtApa I have analyzed the text output from the image to text and the employee numbers have dropped off, the reg ex is working it just has no information from the missing pdf data to pull.

Unfortunately I cant post that result as it contains PII however I am confident the Reg Ex is not the issue. The Text to image has not pulled in teh employee number from 3 pdfs and I have verified this completely

Dave