Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Alteryx OCR Tools

WellyLiyanto
8 - Asteroid

Hi All

 

Just tought i could share this tools i made with you All based on this post :

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Is-anybody-using-OCR-optical-character...

 

I made this tools using python pytesseract library but for using this tools first you will need to install tesseract OCR first to get the languange library from https://github.com/UB-Mannheim/tesseract/wiki

 

For now i put the tesseract OCR default library folder in python code at C:\Program Files\Tesseract-OCR (since alteryx will be installed in 64 bit windows,it should be same for all windows user that has default C folder),feel free to change it if it was needed in the python code

 

clipboard_image_0.png

 

Also,i put the image sample in English.rar to see which file that could be scanned and which file  cannot (will return null)

18 REPLIES 18
joshuaburkhow
ACE Emeritus
ACE Emeritus
This is awesome! Cant wait to try it out
Joshua Burkhow - Alteryx Ace | Global Alteryx Architect @PwC | Blogger @ AlterTricks
JReid
9 - Comet

Nice, I'll have to compare this to the tesseract macro I made in R!

MDOstroff
Alteryx Alumni (Retired)

I ran the Tesseract OCR installer but I'm getting a "library not installed" error when I try to run this. What am I missing?

WellyLiyanto
8 - Asteroid

Hi @MDOstroff 

 

was there more complete message on what library you're missing? i have installed 'tesseract','pytesseract' and 'Image' library for my python environment (for Alteryx) before installing tesseract OCR,I hope this could help

 

Thank you

 

MDOstroff
Alteryx Alumni (Retired)

Thanks for the quick response. Turns out I needed to run Designer in admin mode the first time. (It was a folder permissions issue.) After that, everything runs fine.

Thanks for the cool tool.

jstewart
7 - Meteor

This looks really cool. Perhaps i'm getting in a little over my head but I'm having trouble interpreting my errors as i'm not strong with Python. I am on Designer x64 and the github download is in the correct location. Errors are below... I do have pip and i'm unclear what 'PIL' is but i don't see it in Tessereact-OCR. 

 

Error: Python (1): ---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-1-94adb072d5fc> in <module>
5
6 if Package.isPackageInstalled("pytesseract") == False:
----> 7 Package.installPackages(['tesseract','pytesseract'])
c:\program files\alteryx19.2\bin\miniconda3\pythontool_venv\lib\site-packages\ayx\Package.py in installPackages(package, install_type, debug)
112 print(pip_install_result['msg'])
113 if not pip_install_result['success']:
--> 114 raise pip_install_result['err']
c:\program files\alteryx19.2\bin\miniconda3\pythontool_venv\lib\site-packages\ayx\Utils.py in runSubprocess(args_list, debug)
48
49 try:
---> 50 result = subprocess.check_output(args_list, stderr=subprocess.STDOUT)
51 if debug:
52 print("[Subprocess success!]")
C:\Program Files\Alteryx19.2\bin\Miniconda3\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs)
334
335 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 336 **kwargs).stdout
337
338
C:\Program Files\Alteryx19.2\bin\Miniconda3\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
416 if check and retcode:
417 raise CalledProcessError(retcode, process.args,
--> 418 output=stdout, stderr=stderr)
419 return CompletedProcess(process.args, retcode, stdout, stderr)
420
CalledProcessError: Command '['c:\\program files\\alteryx19.2\\bin\\miniconda3\\pythontool_venv\\scripts\\python.exe', '-m', 'pip', 'install', 'tesseract', 'pytesseract']' returned non-zero exit status 1.

 

 

and

 

Error: Python (1): ---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-2-b47efc3914ed> in <module>
1 from ayx import Alteryx
----> 2 from PIL import Image
3 import pytesseract
4
5 pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
ModuleNotFoundError: No module named 'PIL'

 

 

 

Thank You for your help. 

Cheers, 

Jack

 

WellyLiyanto
8 - Asteroid

Hi @jstewart 

 

Aah sorry, looks like pillow was not default library from python, i already have the library when build this tools so i didn't realize about it,

 

you can run this workflow to install pil library and then try to run the tools again,let me know if you have another trouble when trying to run the OCR tool

 

Cheers

 

Welly 

jstewart
7 - Meteor

Welly, 

 

Thank you for your quick response. However, I am not able to run the install_pil workflow successfully either, error below. I should add that we are on version 2019.2. 

 

Error: Python (2): ---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-1-8cb224446ad3> in <module>
5
6 if Package.isPackageInstalled("PIL") == False:
----> 7 Package.installPackages(['pillow'])
c:\program files\alteryx19.2\bin\miniconda3\pythontool_venv\lib\site-packages\ayx\Package.py in installPackages(package, install_type, debug)
112 print(pip_install_result['msg'])
113 if not pip_install_result['success']:
--> 114 raise pip_install_result['err']
c:\program files\alteryx19.2\bin\miniconda3\pythontool_venv\lib\site-packages\ayx\Utils.py in runSubprocess(args_list, debug)
48
49 try:
---> 50 result = subprocess.check_output(args_list, stderr=subprocess.STDOUT)
51 if debug:
52 print("[Subprocess success!]")
C:\Program Files\Alteryx19.2\bin\Miniconda3\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs)
334
335 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 336 **kwargs).stdout
337
338
C:\Program Files\Alteryx19.2\bin\Miniconda3\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
416 if check and retcode:
417 raise CalledProcessError(retcode, process.args,
--> 418 output=stdout, stderr=stderr)
419 return CompletedProcess(process.args, retcode, stdout, stderr)
420
CalledProcessError: Command '['c:\\program files\\alteryx19.2\\bin\\miniconda3\\pythontool_venv\\scripts\\python.exe', '-m', 'pip', 'install', 'pillow']' returned non-zero exit status 2.

 

jstewart
7 - Meteor

@WellyLiyanto , 

 

It turns out that although I installed the admin version of designer it was not running as such. The workflow worked after making this change without the additional PIL package install. 

 

I look forward to getting deeper into this OCR tool. 

 

Cheers,

Jack

Labels