ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now
The Alteryx Community will be temporarily unavailable for a time due to scheduled maintenance on Thursday, April 22nd. Please plan accordingly.

Alteryx Analytics Hub

Find answers, ask questions, and share expertise about Alteryx Analytics Hub.

OCR functionality in Alteryx

praneshsapmm
8 - Asteroid

Hello All,

 

I have this requirement for Alteryx which involves OCR features to an extent . 

 

We receive supplier invoices in PDF with different templates of images . These images today are read through third party OCR tool and there are processed into SAP. 

There are some quality issues in third party OCR tool in reading the images .

 

Now since we started Alteryx , is there any possibility for Alteryx to read the images and then process into SAP . 

 

Had anyone into it. Please help.

 

Thanks

RishiK
Alteryx
Alteryx

@praneshsapmm I would recommend you take a look at our Machine Learning toolkit within the Alteryx Designer. It is an Add-On, but will support you here (via the PDF blocks):

 

https://www.alteryx.com/article/machine-learning-the-future-is-now-for-analysts

jagdeeshn
11 - Bolide

@praneshsapmm 

 

I did come across a solution once for OCR which used python scripting to bring data into the Alteryx data stream.

 

I believe @AbhilashR was working on a similar solution(python) to interpret data using OCR.

echuong1
Alteryx
Alteryx

There are a couple of ways to OCR and bring PDF data into Alteryx. The method I would suggest depends on the type of PDF you're reading.

 

If the PDF contains mainly data, you may be able to use the free PDF input tools available on the Gallery. These use R or Python to OCR and read the data in. These will read everything from the PDF, and then you can use text to columns, regex, filters, multi-row, etc to clean and parse out the pieces you need.

https://gallery.alteryx.com/#!app/PDF-Input/5b685aff0462d710907f7a3b 

https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa 

 

If you need very specific portions of the PDF, the Intelligence Suite that Rishi mentioned could work well for your purposes. They will essentially allow you to highlight and mask sections of your PDF that you'd like to read in. These work especially well if you have a template. The Intelligence Suite is an add-on to Designer, and will require another license. This license will also give you access to additional text mining (sentiment analysis, topic modeling) as well as machine learning. Information on the IS add-on can be found here:

https://www.alteryx.com/products/alteryx-platform/intelligence-suite