Hi All
Just tought i could share this tools i made with you All based on this post :
I made this tools using python pytesseract library but for using this tools first you will need to install tesseract OCR first to get the languange library from https://github.com/UB-Mannheim/tesseract/wiki
For now i put the tesseract OCR default library folder in python code at C:\Program Files\Tesseract-OCR (since alteryx will be installed in 64 bit windows,it should be same for all windows user that has default C folder),feel free to change it if it was needed in the python code
Also,i put the image sample in English.rar to see which file that could be scanned and which file cannot (will return null)
Nice, I'll have to compare this to the tesseract macro I made in R!
I ran the Tesseract OCR installer but I'm getting a "library not installed" error when I try to run this. What am I missing?
Hi @MDOstroff
was there more complete message on what library you're missing? i have installed 'tesseract','pytesseract' and 'Image' library for my python environment (for Alteryx) before installing tesseract OCR,I hope this could help
Thank you
Thanks for the quick response. Turns out I needed to run Designer in admin mode the first time. (It was a folder permissions issue.) After that, everything runs fine.
Thanks for the cool tool.
This looks really cool. Perhaps i'm getting in a little over my head but I'm having trouble interpreting my errors as i'm not strong with Python. I am on Designer x64 and the github download is in the correct location. Errors are below... I do have pip and i'm unclear what 'PIL' is but i don't see it in Tessereact-OCR.
Error: Python (1): ---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-1-94adb072d5fc> in <module>
5
6 if Package.isPackageInstalled("pytesseract") == False:
----> 7 Package.installPackages(['tesseract','pytesseract'])
c:\program files\alteryx19.2\bin\miniconda3\pythontool_venv\lib\site-packages\ayx\Package.py in installPackages(package, install_type, debug)
112 print(pip_install_result['msg'])
113 if not pip_install_result['success']:
--> 114 raise pip_install_result['err']
c:\program files\alteryx19.2\bin\miniconda3\pythontool_venv\lib\site-packages\ayx\Utils.py in runSubprocess(args_list, debug)
48
49 try:
---> 50 result = subprocess.check_output(args_list, stderr=subprocess.STDOUT)
51 if debug:
52 print("[Subprocess success!]")
C:\Program Files\Alteryx19.2\bin\Miniconda3\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs)
334
335 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 336 **kwargs).stdout
337
338
C:\Program Files\Alteryx19.2\bin\Miniconda3\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
416 if check and retcode:
417 raise CalledProcessError(retcode, process.args,
--> 418 output=stdout, stderr=stderr)
419 return CompletedProcess(process.args, retcode, stdout, stderr)
420
CalledProcessError: Command '['c:\\program files\\alteryx19.2\\bin\\miniconda3\\pythontool_venv\\scripts\\python.exe', '-m', 'pip', 'install', 'tesseract', 'pytesseract']' returned non-zero exit status 1.
and
Error: Python (1): ---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-2-b47efc3914ed> in <module>
1 from ayx import Alteryx
----> 2 from PIL import Image
3 import pytesseract
4
5 pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
ModuleNotFoundError: No module named 'PIL'
Thank You for your help.
Cheers,
Jack
Hi @jstewart
Aah sorry, looks like pillow was not default library from python, i already have the library when build this tools so i didn't realize about it,
you can run this workflow to install pil library and then try to run the tools again,let me know if you have another trouble when trying to run the OCR tool
Cheers
Welly
Welly,
Thank you for your quick response. However, I am not able to run the install_pil workflow successfully either, error below. I should add that we are on version 2019.2.
Error: Python (2): ---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-1-8cb224446ad3> in <module>
5
6 if Package.isPackageInstalled("PIL") == False:
----> 7 Package.installPackages(['pillow'])
c:\program files\alteryx19.2\bin\miniconda3\pythontool_venv\lib\site-packages\ayx\Package.py in installPackages(package, install_type, debug)
112 print(pip_install_result['msg'])
113 if not pip_install_result['success']:
--> 114 raise pip_install_result['err']
c:\program files\alteryx19.2\bin\miniconda3\pythontool_venv\lib\site-packages\ayx\Utils.py in runSubprocess(args_list, debug)
48
49 try:
---> 50 result = subprocess.check_output(args_list, stderr=subprocess.STDOUT)
51 if debug:
52 print("[Subprocess success!]")
C:\Program Files\Alteryx19.2\bin\Miniconda3\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs)
334
335 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 336 **kwargs).stdout
337
338
C:\Program Files\Alteryx19.2\bin\Miniconda3\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
416 if check and retcode:
417 raise CalledProcessError(retcode, process.args,
--> 418 output=stdout, stderr=stderr)
419 return CompletedProcess(process.args, retcode, stdout, stderr)
420
CalledProcessError: Command '['c:\\program files\\alteryx19.2\\bin\\miniconda3\\pythontool_venv\\scripts\\python.exe', '-m', 'pip', 'install', 'pillow']' returned non-zero exit status 2.
It turns out that although I installed the admin version of designer it was not running as such. The workflow worked after making this change without the additional PIL package install.
I look forward to getting deeper into this OCR tool.
Cheers,
Jack