Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

OCR Garbage if selecting more then a column

Bobbins
8 - Asteroid

Good Morning,

Trialing the new Computer Vision tools with some incredible simplistic certificates (ultra clear writing, about 10 lines of addresses, locations etc) followed by the data table which is 4 columns wide with a header.

  • If i annotate each cell, all is good
  • If I annotate each column, as long as the entire column is full its not a problem as long as each row is full. If Column one is full but Column 2 say has row 3 missing, it will give me a shortened number of rows.
  • If i try and annotate the table then this is where all i get is utter garbage for the text.

But why do I get garbage when i ask it to do the entire table?

Apologies for not displaying a workflow, the certificates I have are sensitive.

Thanks

7 REPLIES 7
TheOC
15 - Aurora
15 - Aurora

hey @Bobbins 

would it be possible for you to create a similar certificate - for the purpose of sharing to help investigate this issue for you?

Thanks,
TheOC


Bulien
Bobbins
8 - Asteroid

Hello @TheOC ,

After much fiddling, I have!

Please find attached the workflow and a test certificate. This one has a nice table borders (but not all of them do). I have also included a workflow to show the different ways of doing this and the problems faced.

Thank you

(Packaged version in two posts down)

TheOC
15 - Aurora
15 - Aurora

hey @Bobbins 

Sorry to be a pain,
Can you please open your workflow, and hit options at the top, and then 'export workflow' and then ensure everything is selected in that menu, and send me the file produced from that?

TheOC_1-1638881265043.png

 


It will just ensure the input files are attached to the workflow - i know you sent the PDF - but i have lost the annotations, as i have to re-input the files:

TheOC_0-1638881257400.png



Thanks!
TheOC


Bulien
Bobbins
8 - Asteroid

Hello @TheOC ,
Your not a pain, I forgot about needing it to be packaged. Attached

TheOC
15 - Aurora
15 - Aurora

hey @Bobbins 

Sorry - but not much joy from me I'm afraid.

I've managed to single it down to being a problem with the image to text tool - as the pdf input is working perfectly fine. I tested this by outputting the file that it inputs (just to see if its corrupted when input, or blurry etc etc) but its perfectly clear.

You're definitely right - its got some weird values output, and i cant exactly see why. Its not even a template issue as it doesn't work if you extract all text on the page.

 

Image processing doesn't seem to provide much joy either, I'd have thought thresholding would have reduced any possible noise... Weird, I've had this working fine plenty of times.

Could be worth escalating to Alteryx support - or hopefully one of the Genius's on the community know why this would be happening (Paging one of my favourite nerds, @SusanCS )


I've attached my workings for anyone looking at this too, hopefully provides some aid. In Essence:
Input:

TheOC_0-1638884682722.png



Output:

TheOC_2-1638884907539.png

 



 


Bulien
Bobbins
8 - Asteroid

Thanks @TheOC . I can't be the only one having this problem though surely?

TheOC
15 - Aurora
15 - Aurora

hey @Bobbins 

The computer vision tools are relatively new - and reading PDF's is an incredibly difficult task for a computer to work robustly. 

With that said, other times I've used these tools, they've worked as expected - and I can't grasp what's different about the PDF you attached.

 

A potentially shout would be @DiganP 's custom tool:
https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa


Bulien
Labels
Top Solution Authors