Hi friends,
I have run into an issue and hope someone can help.
Use case: Identify the pages from a pdf based on if certain text appears on a page. Example, if I have 15 page PDF and need to identify if 'Client Account' text appears on 'x' pages then extract those pages and create a new pdf. These 'extracted' pages could be different for each pdf.
What's I could accomplish: I use 'pdftools' (R lib) to do PDF parsing and then find out pages where those words exists without any issue.
Problem: Next step of 'extracting' those pages and creating a new file
Kind of resolution: PyPDF2 offers this solution, and there is a code I can use:
https://learndataanalysis.org/how-to-extract-pdf-pages-and-save-as-a-separate-pdf-file-using-python/
This code works perfectly if I have a 'static' filename and pages numbers (input file, output file and page numbers).
The problem is when I try to make this a macro, I can't figure out how to update the 'static' fields with variables to use the filename and page numbers I pass as variables in the macro.
I have attached the macro I am trying to build. New to python as well and this is my first time every using python code.
Any help is appreciated. If I am not clear, please do ask questions.