Alteryx Designer Desktop Discussions

ChrisDoar · ‎02-07-2023

Does anyone have any experience with reading PDF's via Python in Alteryx using the PyPDF2 package and can see what is wrong here?

I've managed to import the package but every time I try to run the workflow it fails with the following error message:

F

This is the script to open and read the file and this file definitely exists in this location. I've checked and double checked :)

import PyPDF2
pdf1File = open("P:\Content Manager\PDF\HABDHGOKMLD.PDF")
reader = (pdf1File)
number_of_pages = len(reader.pages)
page = reader.pages[0]
text = page.extract_text()

I have noticed '\' in the path is updated to '\\' in the error but even specifying this in the open statement returns the same error. This is probably something really obvious that I just can not see.

Thanks,

grossal · ‎02-07-2023

HI @ChrisDoar,

There are usually two ways to handle this.

Option 1: Convert \\ to /

It's completely up to you where you do this, you can do it in Python, or you can do it in Alteryx. I tend to use a simple Formula-Tool to do the trick.

Option 2: Raw Strings

pdf1File = open(r"P:\Content Manager\PDF\HABDHGOKMLD.PDF")

The key here is open(r"Your Path"). This should convert your path to a "raw" string and you don't need to escape anything.

Let me know if it works or if we need to dive deeper into it.

Best

Alex

ChrisDoar · ‎02-07-2023

Hi @grossal,

Thank you for taking the time to get back to me. Neither of those solutions worked unfortunately, see below. But, I reckon the issue may be the network path for the file. My P drive is on a network server, where as if I move the file I'm trying to read to the C drive on the actual computer it finds it no problem. I suspect if I use the full file path and not just the mapped P drive it would work. Something to investigate

and..

Thanks,
Chris

Alteryx Designer Desktop Discussions

Opening and Reading a PDF with PyPDF2 (Python)

Re: Unable to get an output

Re: Extracting the list of sheet names across mult...

Example workflow for setting up a custom list to u...

Re: Firm names parse

Re: Help with Multi-Row formula