This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have run into an issue and hope someone can help.
Use case: Identify the pages from a pdf based on if certain text appears on a page. Example, if I have 15 page PDF and need to identify if 'Client Account' text appears on 'x' pages then extract those pages and create a new pdf. These 'extracted' pages could be different for each pdf.
What's I could accomplish: I use 'pdftools' (R lib) to do PDF parsing and then find out pages where those words exists without any issue.
Problem: Next step of 'extracting' those pages and creating a new file
Kind of resolution: PyPDF2 offers this solution, and there is a code I can use:
Thanks @joshuaburkhow - this will help resolve passing of parameters, the next challenge is the 'python' script is not recognizing the variables and returns an error. If you try putting in the values in the 'text input' and run the workflow, it will return error. How can ensure python script is reading the variables defined.
thanks - I have added my workflow again (updated) with lots of comment line (trying different things).
There 3 input variables are:
Field1 = Original PDF File name
Field2 = New PDF filename to be created
Field3 = Pages from original PDF file to be extracted
FYI..... This code is working when I use the static values for these (as per my original post which includes the Python code page link), so I don't think it's a PDF choke issue.
I was trying to print the type and looks like the filename variable is not getting the 'full value' including path and it's a 'object' while if I used the 'static' variable it's a string. But again, this is based on my limited knowledge of python.