Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Creating a batch macro for PDF Text Parser Tool

Jmann3891
6 - Meteoroid

I am creating a batch macro to run PDFs through. I used a previous posts workflow. The issue I am having is setting up the macro to be able to run the input through correctly. I really have no knowledge in Python and am trying to educate myself, but its not working for me. I also seen another method where the control parameter is connected to the python tool and replaces a specific string. For some reason it was not running correctly for me. I have found a little more success with altering the python code but I am receiving the following error: OSError: [Errno 22] Invalid argument: ' C:\\Users\\Jmann\\Desktop.......'  To my understand this should have been fixed when formatting the string into a raw string format but that did not work. If anyone well versed in python could assist I would appreciate it very much. 

 

The code: 

 

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from ayx import Alteryx
from io import StringIO
import os
import pandas
import re

def convert_pdf_to_txt(path, pages=None):
if not pages:
pagenums = set()
else:
pagenums = set(pages)

output = StringIO()
manager = PDFResourceManager()
converter = TextConverter(manager, output, laparams=LAParams())
interpreter = PDFPageInterpreter(manager, converter)

infile = open(path, 'rb')

for page in PDFPage.get_pages(infile, pagenums):
interpreter.process_page(page)

infile.close()
converter.close()
text = output.getvalue()
output.close()
return text

 

dat= Alteryx.read(r"#1")
dat= str(dat)
data= str.replace(dat,'F1\n0',' ')

 

text = convert_pdf_to_txt(data)

df = pandas.DataFrame({"text":[text]})

df

Alteryx.write(df,1)

 

The Alteryx  #1 variable is a file path. Prior to me using the str.replace code, the string would print the column header as well and try and run that through the convert_pdf_to_text portion and error out. I was able to fix that. I am not sure if it is the correct way to be done, but like I said early I have zero knowledge in python. I have attached to file as well if you would like to play around with it. 

0 REPLIES 0
Labels
Top Solution Authors