Alteryx Designer Desktop Discussions

Dave · ‎08-12-2022

All

I am using the image input tool to read the text from pages of a PDF file, and then reg-ex to locate employee numbers on the pages

However some of the pages are not reading fully so when the data is turned to text, the employee numbers are missing (see nulls below)

Ive examined the full process and the problem seems to be at the image to text phase - the Employee numbers are present on all pdf pages, however some are not making it to the text output.

I think possibly the image to text step is not taking in the full page, maybe due to its dimensions

Is anyone familiar with this problem? Is there a way to set the size of the image that the Image to Text step pulls in?

Thanks in advance

Dave

ArtApa · ‎08-12-2022

Hi @Dave - The information that you provided here is not enough to confidently say that Image to Text does not capture the date.

Can you please add a Browser tool and then click on Cell Viewer in the Results Window to analyse what was captured? Chances are your Regex needs to be adjusted:

Dave · ‎08-13-2022

@ArtApa I have analyzed the text output from the image to text and the employee numbers have dropped off, the reg ex is working it just has no information from the missing pdf data to pull.

Unfortunately I cant post that result as it contains PII however I am confident the Reg Ex is not the issue. The Text to image has not pulled in teh employee number from 3 pdfs and I have verified this completely

Dave

Asnt7 · ‎08-13-2022

That result contains PII, so I can't post it, but I'm quite confident it doesn't have anything to do with the Reg Ex.There is no employee number in 3 pdfs that have been converted to images, and I have checked this thoroughly.

Dave · ‎08-15-2022

Yes it definitely isnt the reg ex - it is 100% the image read to text - it is missing the employee number on 3 PDFs. Those PDFs 100% have an employee number on them so either the image read is not working correctly or the employee number on the pdfs is outside the bounds on the read

Is there any way to test this?

Alteryx Designer Desktop Discussions

Image Input Tool errors

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...