Hey all, I have PDF doc with pages of different invoices. Does anyone know or can recommend if I am able to write a formula to split the PDF into single invoices rather than the single docs of multiple invoices.
Thank you
Solved! Go to Solution.
I think you'll have to do this in Python, in which case I would ask Python forums 👍 some Alteryx people may help, but since it's not native functionality that would be my recommendation.
@sean_meneses1
try this
import os
from PyPDF2 import PdfReader, PdfWriter
import pandas as pd
from ayx import Alteryx
# Retrieve Alteryx input data as a DataFrame
df = Alteryx.read()
# Loop through each row in the input data
for index, row in df.iterrows():
input_pdf_path = row['FullPath'] # Path to the multi-page PDF
output_folder_path = row['Directory'] # Output directory for single-page PDFs
# Read the PDF
pdf_reader = PdfReader(input_pdf_path)
# Loop through each page in the PDF, creating a new single-page PDF
for page_num in range(len(pdf_reader.pages)):
pdf_writer = PdfWriter()
pdf_writer.add_page(pdf_reader.pages[page_num])
# Define the output file path for each single-page PDF
output_pdf_path = os.path.join(output_folder_path, f"invoice_page_{page_num + 1}.pdf")
# Write the single-page PDF to the output folder
with open(output_pdf_path, 'wb') as output_pdf:
pdf_writer.write(output_pdf)
# Output a message for completion
print("PDF splitting complete!")
hope this helps.
@sean_meneses1 With input as 'PDF to text' tool, split the pdf based on 'Dealer Invoice' i.e differentiate between invoices based on a key word and split them using batch macro.
Below example may help: