Creating a batch macro for PDF Text Parser Tool
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am creating a batch macro to run PDFs through. I used a previous posts workflow. The issue I am having is setting up the macro to be able to run the input through correctly. I really have no knowledge in Python and am trying to educate myself, but its not working for me. I also seen another method where the control parameter is connected to the python tool and replaces a specific string. For some reason it was not running correctly for me. I have found a little more success with altering the python code but I am receiving the following error: OSError: [Errno 22] Invalid argument: ' C:\\Users\\Jmann\\Desktop.......' To my understand this should have been fixed when formatting the string into a raw string format but that did not work. If anyone well versed in python could assist I would appreciate it very much.
The code:
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from ayx import Alteryx
from io import StringIO
import os
import pandas
import re
def convert_pdf_to_txt(path, pages=None):
if not pages:
pagenums = set()
else:
pagenums = set(pages)
output = StringIO()
manager = PDFResourceManager()
converter = TextConverter(manager, output, laparams=LAParams())
interpreter = PDFPageInterpreter(manager, converter)
infile = open(path, 'rb')
for page in PDFPage.get_pages(infile, pagenums):
interpreter.process_page(page)
infile.close()
converter.close()
text = output.getvalue()
output.close()
return text
dat= Alteryx.read(r"#1")
dat= str(dat)
data= str.replace(dat,'F1\n0',' ')
text = convert_pdf_to_txt(data)
df = pandas.DataFrame({"text":[text]})
df
Alteryx.write(df,1)
The Alteryx #1 variable is a file path. Prior to me using the str.replace code, the string would print the column header as well and try and run that through the convert_pdf_to_text portion and error out. I was able to fix that. I am not sure if it is the correct way to be done, but like I said early I have zero knowledge in python. I have attached to file as well if you would like to play around with it.
- Labels:
- Batch Macro
- Python
