Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF to Text

raghadaf8
6 - Meteoroid

Hello,

 

i'm trying to read text from PDF file using Computer vision tools. the data i'm reading is in a table and with a small font size and rows are very close to each other , which causes incorrect data and null values when running the workflow. dose any one have a suggestions on how i can extract the data correctly ? where i just need 3 columns from the table out of 8 .

 

 

regards,

 

3 REPLIES 3
AkimasaKajitani
17 - Castor
17 - Castor

Hi @raghadaf8 

 

Did you try the Image Processing tool?

It can be enlarge the image. But I have never try this tool to improve recognition accuracy.

 

AkimasaKajitani_0-1650451266716.png

 

raghadaf8
6 - Meteoroid

yes i have tried using it on the Pdf but it doesn't work 

IraWatt
17 - Castor
17 - Castor

Hey @raghadaf8

Another option is to try use the python tool using the PyPDF2 library. Ive attached an example workflow, you'll need to change the directory to where your file is (shown in picture) and probably need to run Alteryx in Admin mode to try out.

IraWatt_1-1650802535312.png

 

IraWatt_0-1650802346309.png

All the best,

Ira

Labels