Description
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Notify Moderator
This macro uses the python library PyPDF2 to read in PDF files. Pass the tool a list of locations of your PDFs with the .pdf extension included as shown in the example workflow.
Though PyPDF2 is accurate it is not guaranteed to perfectly parse the PDF.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Notify Moderator
Hi IraWatt, thanks for posting such a useful tool. May we have some instructions on how to use it? (are there specific settings in the Resolve File Type to use)
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Notify Moderator
Good afternoon-
I have tried this tool and it successfully reads the first page in my PDF file, but there are a varying number of pages in the file each time the workflow will be read. Is there something I am missing for this to be able to read multiple pages in the same PDF file?
Thanks in advance!
Ryan
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Notify Moderator
Hi
When using the PDF Reader Tool I get following error. What am I doing wrong?
Error: Read in PDF Macro (21): Record #1: Tool #1: Traceback (most recent call last):
File "C:\Users\xbbl63l\AppData\Local\Temp\Engine_7456_0afc20b470d2451d8b2205c863d64f17_\5cbb4f3dfef06d6b1a23edbcc4364560\workbook.py", line 18, in <module>
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
File "C:\Users\xbbl63l\AppData\Roaming\Python\Python38\site-packages\PyPDF2\_reader.py", line 1974, in __init__
deprecation_with_replacement("PdfFileReader", "PdfReader", "3.0.0")
File "C:\Users\xbbl63l\AppData\Roaming\Python\Python38\site-packages\PyPDF2\_utils.py", line 369, in deprecation_with_replacement
deprecation(DEPR_MSG_HAPPENED.format(old_name, removed_in, new_name))
File "C:\Users\xbbl63l\AppData\Roaming\Python\Python38\site-packages\PyPDF2\_utils.py", line 351, in deprecation
raise DeprecationError(msg)
PyPDF2.errors.DeprecationError: PdfFileReader is deprecated and was removed in PyPDF2 3.0.0. Use PdfReader instead.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Notify Moderator
I got the same error message:
Error: Read in PDF Macro (1): Record #1: Tool #1: Traceback (most recent call last):
File "C:\Users\Admin\AppData\Local\Temp\Engine_14128_05e1b91b86b84666884f2a65e54cfa51_\ae1e474b9392a048514a42c74ef794bb\workbook.py", line 18, in <module>
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
File "C:\Users\Admin\AppData\Roaming\Python\Python38\site-packages\PyPDF2\_reader.py", line 1974, in __init__
deprecation_with_replacement("PdfFileReader", "PdfReader", "3.0.0")
File "C:\Users\Admin\AppData\Roaming\Python\Python38\site-packages\PyPDF2\_utils.py", line 369, in deprecation_with_replacement
deprecation(DEPR_MSG_HAPPENED.format(old_name, removed_in, new_name))
File "C:\Users\Admin\AppData\Roaming\Python\Python38\site-packages\PyPDF2\_utils.py", line 351, in deprecation
raise DeprecationError(msg)
PyPDF2.errors.DeprecationError: PdfFileReader is deprecated and was removed in PyPDF2 3.0.0. Use PdfReader instead.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Notify Moderator
Hey, I ran the python error through ChatGPT and it corrected it to this which seems to work for me
#################################
# List all non-standard packages to be imported by your
# script here (only missing packages will be installed)
from ayx import Package
Package.installPackages(['pandas','numpy'])
#################################
from ayx import Alteryx
Alteryx.installPackages(package='PyPDF2', install_type="install --user")
import pandas as pd
import PyPDF2
#################################
pdfFileObj = open(Alteryx.read("#1").iloc[0, 0], 'rb')
pdfReader = PyPDF2.PdfReader(pdfFileObj)
pageObj = pdfReader.pages[0] # Updated to use pages[]
page1 = pageObj.extract_text() # Updated to use extract_text()
page1
#################################
page_df = pd.DataFrame([page1])
Alteryx.write(page_df,1)
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Notify Moderator
Doesn't work:(