Alteryx Designer Desktop Discussions

sschong · ‎12-29-2021

Hi, I installed a module called pdfplumber for a OCR tool that I am working on. I understand that there are OCR tools available on Alteryx but I am trying to formulate this and propose it to my company without having to pay for the additional costs.

I am trying to modify the python code since the incoming and outgoing connection will be different using Alteryx, but I have issues with my code.

I have defined the code for the input data as 'df' but i can't seem to use the 'df' on the OCR code itself.

Thank you for all your help and I appreciate any feedback.

PaulTa · ‎12-29-2021

I would skip the Input pdf (written in R) and use the Python tool to grab the pdf with the .open() statement :

After doing so you can convert it to data frame to use downstream.

Paul T.

sschong · ‎12-30-2021

Hi Paul, thank you for the reply.

In your code, the file you open was based on your own file, but how do I do it if I am trying to use the input data as per my picture below.

Defining the input data from Alteryx as 'df' did not seem to work.

PaulTa · ‎12-30-2021

I would avoid trying to give a data frame to pdf plumber it wouldn't be able to open it.

You can define the variable to point to the pdf directly and then once you get the data you can convert it into data frames.

Here is my example (depending on the your pdf content this may not work):

Paul T.

sschong · ‎12-30-2021

Hi Paul

Ok noted on the input, I understand that the outgoing connection requires only pandas dataframes to pass through.

For my PDF document, there are no columns to parse from the file, so it doesn't work. Is there another recommended way?

I have attached the workflow for your reference, you have no idea how much you're helping me with.. thank you.

Alteryx Designer Desktop Discussions

OCR tool on Python Workflow automated on Alteryx

Re: Unable to get an output

Example workflow for setting up a custom list to u...

Re: Firm names parse

Re: Help with Multi-Row formula

Re: Assign Random data to Executive with limited p...