Extract specific pages from a PDF using rules (specific text on page)

Hi friends,

I have run into an issue and hope someone can help.

Use case: Identify the pages from a pdf based on if certain text appears on a page. Example, if I have 15 page PDF and need to identify if 'Client Account' text appears on 'x' pages then extract those pages and create a new pdf. These 'extracted' pages could be different for each pdf.

What's I could accomplish: I use 'pdftools' (R lib) to do PDF parsing and then find out pages where those words exists without any issue.

Problem: Next step of 'extracting' those pages and creating a new file

Kind of resolution: PyPDF2 offers this solution, and there is a code I can use:

https://learndataanalysis.org/how-to-extract-pdf-pages-and-save-as-a-separate-pdf-file-using-python/

This code works perfectly if I have a 'static' filename and pages numbers (input file, output file and page numbers).

The problem is when I try to make this a macro, I can't figure out how to update the 'static' fields with variables to use the filename and page numbers I pass as variables in the macro.

I have attached the macro I am trying to build. New to python as well and this is my first time every using python code.

Any help is appreciated. If I am not clear, please do ask questions.

Macro_Python_Extract_v03.yxmc

Macros

Accepted answers

ImadZidan

Hello @pankajk ,

Two things to change

1- in the text input file change to include double back slash example X:\\Pankaj\\Project\\Sample_PDF\\Sample 1.PDF

2- change code:

From

This gives you type object

filename = data["Field1"]
newfilepath = data["Field2"]
pagesextract = data["Field3"]

This gives you type string which is why you were having difficulty getting to the file.

filename = data["Field1"][0]
newfilepath = data["Field2"][0]
pagesextract = data["Field3"][0]

the rest of the logic seems ok. lets see.

All comments

joshuaburkhow

Looks like you are just not selecting the right pieces in the Action Tool. You need the data values like this:

pankajk

Thanks @joshuaburkhow - this will help resolve passing of parameters, the next challenge is the 'python' script is not recognizing the variables and returns an error. If you try putting in the values in the 'text input' and run the workflow, it will return error. How can ensure python script is reading the variables defined.

Quick Links

This months top contributors

atcodedog05 19598

Qiu 15867

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703