Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Opening and Reading a PDF with PyPDF2 (Python)

ChrisDoar
5 - Atom

Does anyone have any experience with reading PDF's via Python in Alteryx using the PyPDF2 package and can see what is wrong here? 

 

I've managed to import the package but every time I try to run the workflow it fails with the following error message:

 

Python after.JPGF

 

This is the script to open and read the file and this file definitely exists in this location. I've checked and double checked :) 

 

import PyPDF2
pdf1File = open("P:\Content Manager\PDF\HABDHGOKMLD.PDF")
reader = (pdf1File)
number_of_pages = len(reader.pages)
page = reader.pages[0]
text = page.extract_text()

 

I have noticed '\' in the path is updated to '\\' in the error but even specifying this in the open statement returns the same error. This is probably something really obvious that I just can not see. 

 

Thanks, 

2 REPLIES 2
grossal
15 - Aurora
15 - Aurora

HI @ChrisDoar,

 

There are usually two ways to handle this.

 

Option 1: Convert \\ to /

It's completely up to you where you do this, you can do it in Python, or you can do it in Alteryx. I tend to use a simple Formula-Tool to do the trick.

 

Option 2: Raw Strings

 

pdf1File = open(r"P:\Content Manager\PDF\HABDHGOKMLD.PDF")

The key here is open(r"Your Path"). This should convert your path to a "raw" string and you don't need to escape anything.

 

Let me know if it works or if we need to dive deeper into it.

 

 

Best

Alex

 

ChrisDoar
5 - Atom

Hi @grossal

 

Thank you for taking the time to get back to me. Neither of those solutions worked unfortunately, see below. But, I reckon the issue may be the network path for the file. My P drive is on a network server, where as if I move the file I'm trying to read to the C drive on the actual computer it finds it no problem. I suspect if I use the full file path and not just the mapped P drive it would work. Something to investigate

 

convert.JPG

and..

Raw String.JPG

Thanks, 
Chris 

Labels
Top Solution Authors