Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Passing different filenames into python tool to convert PDFs to text

ck2024
9 - Comet

Hi

 

Sorry - i thought i posted this but can't see it - might be something to do with the multiple spam that is hitting the site.

Please can you help me 

 

I have a list of files from a directory tool that i want to pass into the macro  that contains the python tool

from ayx import Package
Package.installPackages(['pandas','numpy'])

from ayx import Alteryx

import pandas as pd
import pdfplumber

#file = Alteryx.data("#1")
pdf = pdfplumber.open('c:/files/files/INVOICE_3660075585.pdf')
page = pdf.pages[0]
text = page.extract_text()
print(text)

page_df = pd.DataFrame([text])
Alteryx.write(page_df,1)

I would like to change the red text based on the contents of the directory tool...whenever i tried the alteryx.read(#1) function it just went completely wrong, whereas the above will successfully read the contents of the file...i just need to be able to read 400 of them so one by one isn't really practical.

 

Not sure what I would do if it spans two pages though!?  Most are 1 page.


I am only looking for two specific pieces of text so I don't really mind about the formatting etc.

3 REPLIES 3
binuacs
20 - Arcturus

@ck2024 try the updated macro

image.png

ck2024
9 - Comet

Thanks so much @binuacs - it works a wonder... I wonder is there a way of outputing in a seperate column the filename that it relates to? Just so I can separately identify what the information I am trying to gather relates to? At the moment, it all dumps out in a single output so i can't work out which one it belongs to?  Thank you again

binuacs
20 - Arcturus

@ck2024 updated workflow attached

image.png

Labels