Outputting the results of all your hard data manipulation work to your or your stakeholder’s desired format is a key part of the analytic journey. However, this process can often be more complicated than envisioned. In this blog, we will be looking at three different approaches to producing formatted PDF outputs from our data. We will be building off the work done by Joseph Serpis, Alteryx ACE. In his blog, Joe explains using an Excel Template to create a desired report, using the Blob Input and Blob Output tools, as well as the fantastic new Control Containers (available from 2023.1). Please refer to his blog for more detail about Blob tools and Control containers. We will be focusing on leveraging the Blob tools to utilize VBA macros and the power of Python as well as using the standard Alteryx Reporting tools to demonstrate the versatility available to get your PDF output correct. We do not touch on the Computer Vision tools as these require an Alteryx Designer add-on called Intelligence Suite.
The first process we will discuss is utilizing VBA macros and formatted Excel workbooks to create our desired PDF output. The Blob tools give us the power to leverage the work done in Excel. Therefore, let us go over the appropriate setup in Excel to get this working. Our macro-enabled Excel workbook contains two sheets and a macro. The front sheet is a formatted report that references data in Sheet1.
Excel workbook front sheet
Our VBA Macro utilized the Workbook.ExportAsFixedFormat methodology. We are producing our PDF from the Front Sheet and dynamically naming it using information from Sheet1.
VBA Macro Code below:
Sub PDF()
Dim wsA As Worksheet
Dim strPathFile As String
Set wsA = Sheets("Front Sheet")
On Error Resume Next
'create name for saving file
strPathFile = Sheets("Sheet1").Range("F2").Value
wsA.ExportAsFixedFormat Type:=xlTypePDF, _
Filename:=strPathFile, _
Quality:=xlQualityStandard, _
IncludeDocProperties:=False, _
IgnorePrintAreas:=False, _
OpenAfterPublish:=False, _
To:=1
End Sub
Now let us take a look at how we make use of this prepared Excel template in Alteryx. As mentioned, the initial process is explained in detail in Joe’s blog. In short, we read in our Excel template and write out a new Excel file that mimics all the work done in the template using the Blob tools. We dynamically name our Output files in the formula tool.
Alteryx Formula Tool File paths
Once our data transformations are done in Alteryx, we write out to Sheet1 in our newly created formatted Excel workbook. The front sheet gets populated, and we have our report. However, our stakeholder wants this to be in PDF format, and if you are anything like me, you do not want to have to push any more buttons. A good analyst always finds ways to automate. Therefore, we need this to be part of our process. We are going to use the Run Command tool to execute a simple script that will run our PDF macro. We dynamically update the script with our file name and create the name for the executable.
Alteryx Overview – PDF Macro VBA Run Script
Script code:
Set objExcel = CreateObject("Excel.Application")
On Error Resume Next
objExcel.Visible = FALSE
objExcel.DisplayAlerts = FALSE
objExcel.EnableEvents = FALSE
Set objWorkbook = objExcel.Workbooks.Open("REPLACEMENT_TEXT")
On Error Resume Next
objExcel.Run "PDF"
On Error Resume Next
objWorkbook.Save
On Error Resume Next
objExcel.Quit
Our Run Command tool will then execute our script, converting our populated formatted Excel Report to a PDF.
The next process will explore using the Python tool and a prepared PDF document. There are many unique methods when creating outputs with Python. The method we are using minimizes the need for code and takes advantage of the Alteryx tools we have available. Our PDF has been prepared, providing annotations for all the components which need to be filled, for example, our summary table of car sales for the quarter.
PDF Output – Annotations
Therefore, we need to prepare our data and create the necessary summary statistics, ensuring our column names match our annotations exactly.
Alteryx workflow - Python
In the Formula tool, we are specifying the name and location of the input PDF template and the output file. We are using the PyPDF2 (V2.12.1) package to identify our annotations and fill in that information using the correlating data from the Alteryx workflow.
Python Script below:
from ayx import Alteryx, Package
# Installs the Python packages that are not included in the default installation.
# Package.installPackages(['PyPDF2==2.12.1'])
from datetime import datetime
from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, NumberObject
def set_need_appearances_writer(writer):
try:
catalog = writer._root_object
# get the AcroForm tree and add "/NeedAppearances attribute
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
def set_annotation_flag_writer(writer, field_dictionary):
try:
for j in range(0, len(writer.getPage(0)['/Annots'])):
writer_annot = writer.getPage(0)['/Annots'][j].getObject()
for field in field_dictionary:
if writer_annot.get('/T') == field:
writer_annot.update({
NameObject("/Ff"): NumberObject(7),
})
except Exception as e:
print('set_annotation_flag_writer() catch : ', repr(e))
return
def AYX_PDF_form_fill(template, outfile):
try:
field_dictionary = df.to_dict('records')[0]
inputStream = open(template, "rb")
pdf_reader = PdfFileReader(inputStream, strict=False)
pdf_writer = PdfFileWriter()
set_need_appearances_writer(pdf_writer)
pdf_writer.addPage(pdf_reader.getPage(0))\
pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(0), field_dictionary)
set_annotation_flag_writer(pdf_writer, field_dictionary)
outputStream = open(outfile, "wb")
pdf_writer.write(outputStream)
inputStream.close()
outputStream.close()
except Exception as e:
print('AYX_PDF_form_fill() catch : ', repr(e))
return
df = Alteryx.read("#1") #Read in Datastream from Alteryx Designer Canvas
template = df.loc[0,'PDFTemplate']
outfile = df.loc[0,'PDFFileName2']
AYX_PDF_form_fill(template, outfile)
Alteryx.write(df,1)
Our Python script reads in the template and creates our PDF output. This process is incredibly useful when automating PDF forms or company-specific templates.
PDF Output
For the final process, we will try to replicate the PDF output from the previous step using only the native Alteryx Reporting tools. The Alteryx Reporting tools are very powerful but can be tricky to create a specific desired output. I challenged myself to spend the same amount of time configuring the reporting tools as I did with the Python script. I overlayed Table and Text snippets over an image.
Alteryx – Reporting Tools
As there is plenty of information on how to use the reporting tools available, I won’t explain their configuration. With time I believe I could have achieved something close to the same PDF output as the previous example, but with more complicated outputs, this may not be possible using only the Alteryx Reporting tools.
Hopefully, I have demonstrated with the few examples above that, as with all things Alteryx, there are many ways to cook an egg. It all depends on your desired output and providing your stakeholders with the exact same report they have been receiving previously.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.