Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

PDF to Excel using OCR

Dhananjay_Galphat1
7 - Meteor

Hi,

 

I am trying to convert PDF to excel using PDF input tool but I am getting the data somewhat here and there. Is there any option to convert PDF to Excel or Text using OCR in Alteryx.

4 REPLIES 4
CharlieS
17 - Castor
17 - Castor

Hi @Dhananjay_Galphat1 

 

Edit: Exactly what issues are you having? is the tool or the data misbehaving? 

Dhananjay_Galphat1
7 - Meteor

Data is misbehaving. I want to keep my input as dynamic. and I want to create separate excel for each pdf. In output excel some of the data is used in header part of output excel e.g. Date of generation of pdf.

 

so  I thought OCR will be a better option.

CharlieS
17 - Castor
17 - Castor

I would suggest inputting everything from the PDF then using other Alteryx tools to parse and arrange the data as you desire. You could set a single field of the entire page, or use one of the pdf tools from the public Gallery to read the characters and input them to work with. 

 

https://gallery.alteryx.com/#!search/undefined/pdf 

 

So the first step is getting the entire pdf document input into Alteryx. After that, let us know if you need help with the parsing and formatting. Posting a sample workflow with a Text Input from your pdf is usually best to share. 

trevorwightman
8 - Asteroid

Hi @CharlieS ,

 

It looks like the above link no longer takes you anywhere. Is this what you were referring to?

https://community.alteryx.com/t5/Public-Community-Gallery/PDF-Input/ta-p/887038

 

Additionally, are there any other OCR techniques in Alteryx? I am looking to scan a postcard and read the pdf/image into Alteryx to pull off a tracking code that was on the postcard.

 

Thank you!

Trevor

Labels