Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

General Discussions has some can't miss conversations going on right now! From conversations about automation to sharing your favorite Alteryx memes, there's something for everyone. Make it part of your community routine!

Alteryx OCR Tools

wellyLiyanto
7 - Meteor

Hi All

 

Just tought i could share this tools i made with you All based on this post :

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Is-anybody-using-OCR-optical-character...

 

I made this tools using python pytesseract library but for using this tools first you will need to install tesseract OCR first to get the languange library from https://github.com/UB-Mannheim/tesseract/wiki

 

For now i put the tesseract OCR default library folder in python code at C:\Program Files\Tesseract-OCR (since alteryx will be installed in 64 bit windows,it should be same for all windows user that has default C folder),feel free to change it if it was needed in the python code

 

clipboard_image_0.png

 

Also,i put the image sample in English.rar to see which file that could be scanned and which file  cannot (will return null)

joshuaburkhow
14 - Magnetar
14 - Magnetar
This is awesome! Cant wait to try it out
Joshua Burkhow - Alteryx Ace | Global Alteryx Architect @PwC | Blogger @ AlterTricks
JReid
9 - Comet

Nice, I'll have to compare this to the tesseract macro I made in R!

MDOstroff
Alteryx
Alteryx

I ran the Tesseract OCR installer but I'm getting a "library not installed" error when I try to run this. What am I missing?

wellyLiyanto
7 - Meteor

Hi @MDOstroff 

 

was there more complete message on what library you're missing? i have installed 'tesseract','pytesseract' and 'Image' library for my python environment (for Alteryx) before installing tesseract OCR,I hope this could help

 

Thank you

 

MDOstroff
Alteryx
Alteryx

Thanks for the quick response. Turns out I needed to run Designer in admin mode the first time. (It was a folder permissions issue.) After that, everything runs fine.

Thanks for the cool tool.

jstewart
7 - Meteor

This looks really cool. Perhaps i'm getting in a little over my head but I'm having trouble interpreting my errors as i'm not strong with Python. I am on Designer x64 and the github download is in the correct location. Errors are below... I do have pip and i'm unclear what 'PIL' is but i don't see it in Tessereact-OCR. 

 

Error: Python (1): ---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-1-94adb072d5fc> in <module>
5
6 if Package.isPackageInstalled("pytesseract") == False:
----> 7 Package.installPackages(['tesseract','pytesseract'])
c:\program files\alteryx19.2\bin\miniconda3\pythontool_venv\lib\site-packages\ayx\Package.py in installPackages(package, install_type, debug)
112 print(pip_install_result['msg'])
113 if not pip_install_result['success']:
--> 114 raise pip_install_result['err']
c:\program files\alteryx19.2\bin\miniconda3\pythontool_venv\lib\site-packages\ayx\Utils.py in runSubprocess(args_list, debug)
48
49 try:
---> 50 result = subprocess.check_output(args_list, stderr=subprocess.STDOUT)
51 if debug:
52 print("[Subprocess success!]")
C:\Program Files\Alteryx19.2\bin\Miniconda3\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs)
334
335 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 336 **kwargs).stdout
337
338
C:\Program Files\Alteryx19.2\bin\Miniconda3\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
416 if check and retcode:
417 raise CalledProcessError(retcode, process.args,
--> 418 output=stdout, stderr=stderr)
419 return CompletedProcess(process.args, retcode, stdout, stderr)
420
CalledProcessError: Command '['c:\\program files\\alteryx19.2\\bin\\miniconda3\\pythontool_venv\\scripts\\python.exe', '-m', 'pip', 'install', 'tesseract', 'pytesseract']' returned non-zero exit status 1.

 

 

and

 

Error: Python (1): ---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-2-b47efc3914ed> in <module>
1 from ayx import Alteryx
----> 2 from PIL import Image
3 import pytesseract
4
5 pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
ModuleNotFoundError: No module named 'PIL'

 

 

 

Thank You for your help. 

Cheers, 

Jack

 

wellyLiyanto
7 - Meteor

Hi @jstewart 

 

Aah sorry, looks like pillow was not default library from python, i already have the library when build this tools so i didn't realize about it,

 

you can run this workflow to install pil library and then try to run the tools again,let me know if you have another trouble when trying to run the OCR tool

 

Cheers

 

Welly 

jstewart
7 - Meteor

Welly, 

 

Thank you for your quick response. However, I am not able to run the install_pil workflow successfully either, error below. I should add that we are on version 2019.2. 

 

Error: Python (2): ---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-1-8cb224446ad3> in <module>
5
6 if Package.isPackageInstalled("PIL") == False:
----> 7 Package.installPackages(['pillow'])
c:\program files\alteryx19.2\bin\miniconda3\pythontool_venv\lib\site-packages\ayx\Package.py in installPackages(package, install_type, debug)
112 print(pip_install_result['msg'])
113 if not pip_install_result['success']:
--> 114 raise pip_install_result['err']
c:\program files\alteryx19.2\bin\miniconda3\pythontool_venv\lib\site-packages\ayx\Utils.py in runSubprocess(args_list, debug)
48
49 try:
---> 50 result = subprocess.check_output(args_list, stderr=subprocess.STDOUT)
51 if debug:
52 print("[Subprocess success!]")
C:\Program Files\Alteryx19.2\bin\Miniconda3\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs)
334
335 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 336 **kwargs).stdout
337
338
C:\Program Files\Alteryx19.2\bin\Miniconda3\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
416 if check and retcode:
417 raise CalledProcessError(retcode, process.args,
--> 418 output=stdout, stderr=stderr)
419 return CompletedProcess(process.args, retcode, stdout, stderr)
420
CalledProcessError: Command '['c:\\program files\\alteryx19.2\\bin\\miniconda3\\pythontool_venv\\scripts\\python.exe', '-m', 'pip', 'install', 'pillow']' returned non-zero exit status 2.

 

jstewart
7 - Meteor

@wellyLiyanto , 

 

It turns out that although I installed the admin version of designer it was not running as such. The workflow worked after making this change without the additional PIL package install. 

 

I look forward to getting deeper into this OCR tool. 

 

Cheers,

Jack

Labels