Hi friends,
I have run into an issue and hope someone can help.
Use case: Identify the pages from a pdf based on if certain text appears on a page. Example, if I have 15 page PDF and need to identify if 'Client Account' text appears on 'x' pages then extract those pages and create a new pdf. These 'extracted' pages could be different for each pdf.
What's I could accomplish: I use 'pdftools' (R lib) to do PDF parsing and then find out pages where those words exists without any issue.
Problem: Next step of 'extracting' those pages and creating a new file
Kind of resolution: PyPDF2 offers this solution, and there is a code I can use:
https://learndataanalysis.org/how-to-extract-pdf-pages-and-save-as-a-separate-pdf-file-using-python/
This code works perfectly if I have a 'static' filename and pages numbers (input file, output file and page numbers).
The problem is when I try to make this a macro, I can't figure out how to update the 'static' fields with variables to use the filename and page numbers I pass as variables in the macro.
I have attached the macro I am trying to build. New to python as well and this is my first time every using python code.
Any help is appreciated. If I am not clear, please do ask questions.
Solved! Go to Solution.
Looks like you are just not selecting the right pieces in the Action Tool. You need the data values like this:
Thanks @joshuaburkhow - this will help resolve passing of parameters, the next challenge is the 'python' script is not recognizing the variables and returns an error. If you try putting in the values in the 'text input' and run the workflow, it will return error. How can ensure python script is reading the variables defined.
Hello @pankajk ,
Sorry if I am late into this discussion. I have looked at the macro and its Field1 uppercase F rather than field1.
Hope this helps
Thanks @ImadZidan for picking up these errors, which I have fixed. But somehow it's still not picking up the filename and now giving me the type error.
I even tried with "r'" in front since it worked with the absolute filename, but there seems to be something amiss here.
Appreciate all the support.
Hello @pankajk ,
Is it possible to show me what you have in the three fields as a value.
It will help.
It looks to me that the code is executing. However, the PDF reader is choking when reading the PDF.
thanks - I have added my workflow again (updated) with lots of comment line (trying different things).
There 3 input variables are:
Field1 = Original PDF File name
Field2 = New PDF filename to be created
Field3 = Pages from original PDF file to be extracted
FYI..... This code is working when I use the static values for these (as per my original post which includes the Python code page link), so I don't think it's a PDF choke issue.
I was trying to print the type and looks like the filename variable is not getting the 'full value' including path and it's a 'object' while if I used the 'static' variable it's a string. But again, this is based on my limited knowledge of python.
Hello @pankajk ,
Two things to change
1- in the text input file change to include double back slash example X:\\Pankaj\\Project\\Sample_PDF\\Sample 1.PDF
2- change code:
From
This gives you type object
filename = data["Field1"]
newfilepath = data["Field2"]
pagesextract = data["Field3"]
To
This gives you type string which is why you were having difficulty getting to the file.
filename = data["Field1"][0]
newfilepath = data["Field2"][0]
pagesextract = data["Field3"][0]
the rest of the logic seems ok. lets see.
Thanks @ImadZidan - You are awesome and thanks for your patience and all the support. I was able to make it work based on your feedback 🙂
I had to make the following additional changes so that the pagelist was read as a list versus as an object/text:
In my text input change from 0,12,25 to [0,12,25] --> i.e, add beginning and ending brackets.
And add the following to my code so that it changed the text/string to a list.
import ast
# Converting pages to be extracted from string to list
pagelist = ast.literal_eval(pagesextract)
I have converted this to an app and it works nicely.
I will accept your solution and give it my like! Thanks again so very much. Greatly appreciated.