All
I am using the image input tool to read the text from pages of a PDF file, and then reg-ex to locate employee numbers on the pages
However some of the pages are not reading fully so when the data is turned to text, the employee numbers are missing (see nulls below)
Ive examined the full process and the problem seems to be at the image to text phase - the Employee numbers are present on all pdf pages, however some are not making it to the text output.
I think possibly the image to text step is not taking in the full page, maybe due to its dimensions
Is anyone familiar with this problem? Is there a way to set the size of the image that the Image to Text step pulls in?
Thanks in advance
Dave
Hi @Dave - The information that you provided here is not enough to confidently say that Image to Text does not capture the date.
Can you please add a Browser tool and then click on Cell Viewer in the Results Window to analyse what was captured? Chances are your Regex needs to be adjusted:
@ArtApa I have analyzed the text output from the image to text and the employee numbers have dropped off, the reg ex is working it just has no information from the missing pdf data to pull.
Unfortunately I cant post that result as it contains PII however I am confident the Reg Ex is not the issue. The Text to image has not pulled in teh employee number from 3 pdfs and I have verified this completely
Dave
That result contains PII, so I can't post it, but I'm quite confident it doesn't have anything to do with the Reg Ex.There is no employee number in 3 pdfs that have been converted to images, and I have checked this thoroughly.
Yes it definitely isnt the reg ex - it is 100% the image read to text - it is missing the employee number on 3 PDFs. Those PDFs 100% have an employee number on them so either the image read is not working correctly or the employee number on the pdfs is outside the bounds on the read
Is there any way to test this?