Good Morning, Trialing the new Computer Vision tools with some incredible simplistic certificates (ultra clear writing, about 10 lines of addresses, locations etc) followed by the data table which is 4 columns wide with a header. * If i annotate each cell, all is good * If I annotate each column, as long as the entire column is full its not a problem as long as each row is full. If Column one is full but Column 2 say has row 3 missing, it will give me a shortened number of rows. * If i try and annotate the table then this is where all i get is utter garbage for the text. But why do I get garbage when i ask it to do the entire table? Apologies for not displaying a workflow, the certificates I have are sensitive. Thanks

OCR Garbage if selecting more then a column

Good Morning,

Trialing the new Computer Vision tools with some incredible simplistic certificates (ultra clear writing, about 10 lines of addresses, locations etc) followed by the data table which is 4 columns wide with a header.

If i annotate each cell, all is good
If I annotate each column, as long as the entire column is full its not a problem as long as each row is full. If Column one is full but Column 2 say has row 3 missing, it will give me a shortened number of rows.
If i try and annotate the table then this is where all i get is utter garbage for the text.

But why do I get garbage when i ask it to do the entire table?

Apologies for not displaying a workflow, the certificates I have are sensitive.

Thanks

Machine Learning

Accepted answers

All comments

TheOC

hey @Bobbins

would it be possible for you to create a similar certificate - for the purpose of sharing to help investigate this issue for you?

Thanks,
TheOC

Bobbins

Hello @TheOC ,

After much fiddling, I have!

Please find attached the workflow and a test certificate. This one has a nice table borders (but not all of them do). I have also included a workflow to show the different ways of doing this and the problems faced.

Thank you

(Packaged version in two posts down)

Test Cert.pdf

TheOC

hey @Bobbins

Sorry to be a pain,
Can you please open your workflow, and hit options at the top, and then 'export workflow' and then ensure everything is selected in that menu, and send me the file produced from that?

It will just ensure the input files are attached to the workflow - i know you sent the PDF - but i have lost the annotations, as i have to re-input the files:

Thanks!
TheOC

Bobbins

Hello @TheOC ,
Your not a pain, I forgot about needing it to be packaged. Attached

Exported Import PDF.yxzp

TheOC

hey @Bobbins

Sorry - but not much joy from me I'm afraid.

I've managed to single it down to being a problem with the image to text tool - as the pdf input is working perfectly fine. I tested this by outputting the file that it inputs (just to see if its corrupted when input, or blurry etc etc) but its perfectly clear.

You're definitely right - its got some weird values output, and i cant exactly see why. Its not even a template issue as it doesn't work if you extract all text on the page.

Image processing doesn't seem to provide much joy either, I'd have thought thresholding would have reduced any possible noise... Weird, I've had this working fine plenty of times.

Could be worth escalating to Alteryx support - or hopefully one of the Genius's on the community know why this would be happening (Paging one of my favourite nerds, @SusanCS )

I've attached my workings for anyone looking at this too, hopefully provides some aid. In Essence:
Input:

Output:

Import PDF.yxzp

Bobbins

Thanks @TheOC . I can't be the only one having this problem though surely?

TheOC

hey @Bobbins

The computer vision tools are relatively new - and reading PDF's is an incredibly difficult task for a computer to work robustly.

With that said, other times I've used these tools, they've worked as expected - and I can't grasp what's different about the PDF you attached.

A potentially shout would be @DiganP 's custom tool:
https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa

Quick Links

This months top contributors

atcodedog05 19458

Qiu 15866

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703