Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Image Input Tool errors

Dave
8 - Asteroid

All

 

I am using the image input tool to read the text from pages of a PDF file, and then reg-ex to locate employee numbers on the pages

 

Dave_0-1660338644393.png

Dave_3-1660338970153.png

 

Dave_4-1660338995793.png

 

However some of the pages are not reading fully so when the data is turned to text, the employee numbers are missing (see nulls below)

 

 

 

Dave_1-1660338719293.png

 

Dave_2-1660338748909.png

 

Ive examined the full process and the problem seems to be at the image to text phase - the Employee numbers are present on all pdf pages, however some are not making it to the text output. 

 

I think possibly the image to text step is not taking in the full page, maybe due to its dimensions

 

Is anyone familiar with this problem? Is there a way to set the size of the image that the Image to Text step pulls in?

 

Thanks in advance

 

Dave

4 REPLIES 4
ArtApa
Alteryx
Alteryx

Hi @Dave - The information that you provided here is not enough to confidently say that Image to Text does not capture the date. 

 

Can you please add a Browser tool and then click on Cell Viewer in the Results Window to analyse what was captured? Chances are your Regex needs to be adjusted:

 

ArtApa_0-1660368424180.png

 

Dave
8 - Asteroid

@ArtApa I have analyzed the text output from the image to text and the employee numbers have dropped off, the reg ex is working it just has no information from the missing pdf data to pull. 

 

Unfortunately I cant post that result as it contains PII however I am confident the Reg Ex is not the issue. The Text to image has not pulled in teh employee number from 3 pdfs and I have verified this completely

 

Dave

 

 

Asnt7
5 - Atom

That result contains PII, so I can't post it, but I'm quite confident it doesn't have anything to do with the Reg Ex.There is no employee number in 3 pdfs that have been converted to images, and I have checked this thoroughly.maxifoot.JPG

Dave
8 - Asteroid

Yes it definitely isnt the reg ex - it is 100% the image read to text - it is missing the employee number on 3 PDFs. Those PDFs 100% have an employee number on them so either the image read is not working correctly or the employee number on the pdfs is outside the bounds on the read

 

Is there any way to test this?

Labels