Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

OCR tool on Python Workflow automated on Alteryx

sschong
7 - Meteor

Hi, I installed a module called pdfplumber for a OCR tool that I am working on. I understand that there are OCR tools available on Alteryx but I am trying to formulate this and propose it to my company without having to pay for the additional costs. 

 

I am trying to modify the python code since the incoming and outgoing connection will be different using Alteryx, but I have issues with my code. 

 

I have defined the code for the input data as 'df' but i can't seem to use the 'df' on the OCR code itself.  

 

sschong_2-1640766756751.png

 

sschong_1-1640766639754.png

 

Thank you for all your help and I appreciate any feedback. 

 

 

4 REPLIES 4
PaulTa
Alteryx Alumni (Retired)

I would skip the Input pdf (written in R) and use the Python tool to grab the pdf with the .open() statement : 

After doing so you can convert it to data frame to use downstream. 

2021-12-29_12h52_53.jpg

 

Paul T.

sschong
7 - Meteor

Hi Paul, thank you for the reply. 

 

In your code, the file you open was based on your own file, but how do I do it if I am trying to use the input data as per my picture below. 

Defining the input data from Alteryx as 'df' did not seem to work. 

 

sschong_0-1640856019878.png

 

PaulTa
Alteryx Alumni (Retired)

I would avoid trying to give a data frame to pdf plumber it wouldn't be able to open it.

You can define the variable to point to the pdf directly and then once you get the data you can convert it into data frames.

Here is my example (depending on the your pdf content this may not work): 

2021-12-30_11h23_32.jpg

 

Paul T.

sschong
7 - Meteor

Hi Paul

 

Ok noted on the input, I understand that the outgoing connection requires only pandas dataframes to pass through. 

 

For my PDF document, there are no columns to parse from the file, so it doesn't work. Is there another recommended way?

 

I have attached the workflow for your reference, you have no idea how much you're helping me with.. thank you.

sschong_0-1640876581871.png

 

Labels