Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

General Discussions

Discuss any topics that are not product-specific here.

OCR functionality in Alteryx

praneshsapmm
8 - Asteroid

Hello All,

 

I have this requirement for Alteryx which involves OCR features to an extent . 

 

We receive supplier invoices in PDF with different templates of images . These images today are read through third party OCR tool and there are processed into SAP. 

There are some quality issues in third party OCR tool in reading the images .

 

Now since we started Alteryx , is there any possibility for Alteryx to read the images and then process into SAP . 

 

Had anyone into it. Please help.

 

Thanks

4 REPLIES 4
RishiK
Alteryx
Alteryx

@praneshsapmm I would recommend you take a look at our Machine Learning toolkit within the Alteryx Designer. It is an Add-On, but will support you here (via the PDF blocks):

 

https://www.alteryx.com/article/machine-learning-the-future-is-now-for-analysts

JagdeeshN
12 - Quasar
12 - Quasar

@praneshsapmm 

 

I did come across a solution once for OCR which used python scripting to bring data into the Alteryx data stream.

 

I believe @AbhilashR was working on a similar solution(python) to interpret data using OCR.

echuong1
Alteryx Alumni (Retired)

There are a couple of ways to OCR and bring PDF data into Alteryx. The method I would suggest depends on the type of PDF you're reading.

 

If the PDF contains mainly data, you may be able to use the free PDF input tools available on the Gallery. These use R or Python to OCR and read the data in. These will read everything from the PDF, and then you can use text to columns, regex, filters, multi-row, etc to clean and parse out the pieces you need.

https://gallery.alteryx.com/#!app/PDF-Input/5b685aff0462d710907f7a3b 

https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa 

 

If you need very specific portions of the PDF, the Intelligence Suite that Rishi mentioned could work well for your purposes. They will essentially allow you to highlight and mask sections of your PDF that you'd like to read in. These work especially well if you have a template. The Intelligence Suite is an add-on to Designer, and will require another license. This license will also give you access to additional text mining (sentiment analysis, topic modeling) as well as machine learning. Information on the IS add-on can be found here:

https://www.alteryx.com/products/alteryx-platform/intelligence-suite 

dandreas
5 - Atom

Nevermind. Question was answered.

Labels