Filling out PDF forms is often a necessary step in the analytic processes. Here's is a sample use-case and practical guide that will help you get your analytics data into your PDF forms. While there currently is no out-of-the-box / pre-built tool for PDF forms in 2019.2, the steps below will show how easy it is to create your own PDF form filling macro with a single python tool.
USE CASE / BACKGROUND:
Our PDF form filling scenario will be a workforce analytics automation scenario that is all about eliminating those mundane and time-consuming processes associated with the creation of employee vesting statements. Here we will showcase how the Alteryx Analytic platform will help streamline and control the equity decisions within an organization's guidelines while delivering transparency and auditability.
TYPICAL BUSINESS ENVIRONMENT & OPERATIONAL CHALLENGES:
- Management directive to simplify and streamline the development and allocation of equity awards
- Deliver an enhanced view of equity programs with improved levels of transparency, control and auditability across the compensation planning process.
- Reduce the dependency on legacy mail merge approaches that have proved too time-consuming and lacked the requisite level of control.
The PDF Form - Employee Vesting Template:
In this scenario our HR / employee vesting template was created in Microsoft Word & Adobe Acrobat DC and adheres to the 1.7 PDF standard.
Preparing the Form:
An important step is making sure that the Alteryx Field/Column name *IS EQUIVALENT* to the form field name in the Adobe PDF template.
Other notable items:
* Field order does not matter
* Visibility Flag is normally set at True for these types of forms.
* Read-only Flag is Optional. Our python code will set this flag in the PDF and will effectively eliminate the 'blue boxes'
Preparing the Workflow:
As mentioned above, Alteryx Field/Column name must match the form field name in the Adobe PDF template. In the below/attached workflow we will alter the data stream to match our form, by creating a new field call fullName. Here are a couple of additional changes we made along the way:
* Added an Employee Password (dynamic)
* Added an HR (owner) Password (static)
* Specified the fullpath to our PDF template form
* Specified directory of our output files
* Cleaned up the look (2 digit decimals)
* Converted all numeric fields to a string
THE PYTHON BASED BATCH MACRO:
This is a somewhat universal macro. It will take in any data stream and attempt to insert those values into the correct PDF fields. Then it will flatten and encrypt before it moves onto the next record (employee).
END RESULT:
Here is a copy of the end result:
For those that want to dive into the python code, just expand the spoiler below.
#################################
__author__ = 'Peter Anderson'
__copyright__ = 'Copyright 2019, Alteryx, Inc.'
__credits__ = ['Peter Anderson']
__version__ = '2019.7.1'
__maintainer__ = 'Peter Anderson'
__email__ = 'peter.anderson@alteryx.com'
__status__ = 'Dev'
from ayx import Alteryx, Package
# Installs the Python packages that are not included in the default installation.
#Package.installPackages(['PyPDF2'])
from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, NumberObject
#################################
''' NEED APPEARANCES FLAG
Setting the NeedAppearances flag to true specifies whether to construct
appearance streams and appearance dictionaries for all widget annotations
in the document. Default value: false. This function will change that value to
to "True."
See 12.7.2 and 7.7.2 for more information:
<a href="<a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>>" target="_blank"><a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a</a>>>
'''
def set_need_appearances_writer(writer):
try:
catalog = writer._root_object
# get the AcroForm tree and add "/NeedAppearances attribute
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
''' ANNOTATION FLAG
Setting to flag to "7" will not allow the annotations to interact with the
user. The annotation may be displayed or printed (depending on the settings
of the NoView and Print flags) but should not respond to mouse clicks
or change its appearance in response to mouse motions.
See 12.5.3 for more information:
<a href="<a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>>" target="_blank"><a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a</a>>>
'''
def set_annotation_flag_writer(writer, field_dictionary):
try:
for j in range(0, len(writer.getPage(0)['/Annots'])):
writer_annot = writer.getPage(0)['/Annots'][j].getObject()
for field in field_dictionary:
if writer_annot.get('/T') == field:
writer_annot.update({
NameObject("/Ff"): NumberObject(7),
})
except Exception as e:
print('set_annotation_flag_writer() catch : ', repr(e))
return
''' FILLS THE PDF FORM
This functions performs 3 action to filling out a PDF form.
* Forces Adobe's NeedAppearances Flag to True (make values visible)
* Forces Adobe's Annotation Flag to "7" which makes the form read-only.
* Datastream values are placed as values in the PDF Form - Incoming
column names must match the ACROFORM field names.
'''
def AYX_PDF_form_fill(template, outfile, ownerPWD, userPWD):
try:
field_dictionary = df.to_dict('records')[0]
inputStream = open(template, "rb")
pdf_reader = PdfFileReader(inputStream, strict=False)
pdf_writer = PdfFileWriter()
set_need_appearances_writer(pdf_writer)
pdf_writer.addPage(pdf_reader.getPage(0))
pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(0), field_dictionary)
set_annotation_flag_writer(pdf_writer, field_dictionary)
outputStream = open(outfile, "wb")
pdf_writer.encrypt(userPWD, ownerPWD, use_128bit=True)
pdf_writer.write(outputStream)
inputStream.close()
outputStream.close()
except Exception as e:
print('AYX_PDF_form_fill() catch : ', repr(e))
return
#################################
df = Alteryx.read("#1") #Read in Datastream from Alteryx Designer Canvas
template = df.loc[0,'vestingStatementTemplate']
outfile = df.loc[0,'FileName']
ownerPWD = df.loc[0,'ownerPWD']
userPWD = df.loc[0,'userPWD']
AYX_PDF_form_fill(template, outfile, ownerPWD, userPWD)
#################################
Hi @PeterA,
This is a great post and worked perfect using your sample PDF!!
However, when I tried using my PDF form, the filled in text can only be shown after mouse click on that field, and it will be invisible again if the focus is gone (except checkboxes, which works perfect using your additional code in your reply above).
I have read many posts saying that NeedAppearances set to True should resolve the issue, but not in my case.
Do you know what else I could try to fix this?
Thanks in advance!!!
Hey, I have the same problem. Were you ever able to resolve this issue?
Yes, I got that resolved by using another Python package, "pdfrw", instead of "PyPDF2".
You can refer to this page for more detail on how to use this package.
Hope it helps!!
Hi there.
I ran the workflow and it ran successfully, but did not generate any PDF outputs. Am I supposed to replace the browse tool with an output tool to do this or should the python code be functioning to do that? Thank you!
Hi @peter
Thanks for the tutorial
The workflow runs without any errors but it not producing any pdf files. However, just running the macro with template produces the single file. Could you please suggest
Were anyone able to get the output pdf files?
Hi Peter,
First of all thank you so much for this! It has been super helpful so far, but I was wondering if you could possibly explain how to adjust the coding for a different pdf than the template you were using. I am not super familiar with Python and have not been able to figure out how to use this macro set up to update my own pdf.
Thanks!
Hi,
Thanks for sharing this macro! I was able to run this to fill out 1 PDF, but when it goes to the second record to fill out the second PDF it saves the PDF but I'm not able to open the file. I'm assuming the file is either damaged or corrupted. Do you know how I could go about fixing this? Thank you!
Thanks, Peter! This is great!
I am able to run the macro file (vesting_statement_helper_macro.yxmc), but the output looks blank in Adobe. However, if I view the PDF in Microsoft Edge, everything is there. Does anyone know how to get this to appear in Adobe?