OCR Garbage if selecting more then a column
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Good Morning,
Trialing the new Computer Vision tools with some incredible simplistic certificates (ultra clear writing, about 10 lines of addresses, locations etc) followed by the data table which is 4 columns wide with a header.
- If i annotate each cell, all is good
- If I annotate each column, as long as the entire column is full its not a problem as long as each row is full. If Column one is full but Column 2 say has row 3 missing, it will give me a shortened number of rows.
- If i try and annotate the table then this is where all i get is utter garbage for the text.
But why do I get garbage when i ask it to do the entire table?
Apologies for not displaying a workflow, the certificates I have are sensitive.
Thanks
- Labels:
- Machine Learning
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
hey @Bobbins
would it be possible for you to create a similar certificate - for the purpose of sharing to help investigate this issue for you?
Thanks,
TheOC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello @TheOC ,
After much fiddling, I have!
Please find attached the workflow and a test certificate. This one has a nice table borders (but not all of them do). I have also included a workflow to show the different ways of doing this and the problems faced.
Thank you
(Packaged version in two posts down)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
hey @Bobbins
Sorry to be a pain,
Can you please open your workflow, and hit options at the top, and then 'export workflow' and then ensure everything is selected in that menu, and send me the file produced from that?
It will just ensure the input files are attached to the workflow - i know you sent the PDF - but i have lost the annotations, as i have to re-input the files:
Thanks!
TheOC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello @TheOC ,
Your not a pain, I forgot about needing it to be packaged. Attached
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
hey @Bobbins
Sorry - but not much joy from me I'm afraid.
I've managed to single it down to being a problem with the image to text tool - as the pdf input is working perfectly fine. I tested this by outputting the file that it inputs (just to see if its corrupted when input, or blurry etc etc) but its perfectly clear.
You're definitely right - its got some weird values output, and i cant exactly see why. Its not even a template issue as it doesn't work if you extract all text on the page.
Image processing doesn't seem to provide much joy either, I'd have thought thresholding would have reduced any possible noise... Weird, I've had this working fine plenty of times.
Could be worth escalating to Alteryx support - or hopefully one of the Genius's on the community know why this would be happening (Paging one of my favourite nerds, @SusanCS )
I've attached my workings for anyone looking at this too, hopefully provides some aid. In Essence:
Input:
Output:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks @TheOC . I can't be the only one having this problem though surely?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
hey @Bobbins
The computer vision tools are relatively new - and reading PDF's is an incredibly difficult task for a computer to work robustly.
With that said, other times I've used these tools, they've worked as expected - and I can't grasp what's different about the PDF you attached.
A potentially shout would be @DiganP 's custom tool:
https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa
