Filling out PDF forms is often a necessary step in the analytic processes. Here's is a sample use-case and practical guide that will help you get your analytics data into your PDF forms. While there currently is no out-of-the-box / pre-built tool for PDF forms in 2019.2, the steps below will show how easy it is to create your own PDF form filling macro with a single python tool.
USE CASE / BACKGROUND:
Our PDF form filling scenario will be a workforce analytics automation scenario that is all about eliminating those mundane and time-consuming processes associated with the creation of employee vesting statements. Here we will showcase how the Alteryx Analytic platform will help streamline and control the equity decisions within an organization's guidelines while delivering transparency and auditability.
TYPICAL BUSINESS ENVIRONMENT & OPERATIONAL CHALLENGES:
- Management directive to simplify and streamline the development and allocation of equity awards
- Deliver an enhanced view of equity programs with improved levels of transparency, control and auditability across the compensation planning process.
- Reduce the dependency on legacy mail merge approaches that have proved too time-consuming and lacked the requisite level of control.
The PDF Form - Employee Vesting Template:
In this scenario our HR / employee vesting template was created in Microsoft Word & Adobe Acrobat DC and adheres to the 1.7 PDF standard.
Preparing the Form:
An important step is making sure that the Alteryx Field/Column name *IS EQUIVALENT* to the form field name in the Adobe PDF template.
Other notable items:
* Field order does not matter
* Visibility Flag is normally set at True for these types of forms.
* Read-only Flag is Optional. Our python code will set this flag in the PDF and will effectively eliminate the 'blue boxes'
Preparing the Workflow:
As mentioned above, Alteryx Field/Column name must match the form field name in the Adobe PDF template. In the below/attached workflow we will alter the data stream to match our form, by creating a new field call fullName. Here are a couple of additional changes we made along the way:
* Added an Employee Password (dynamic)
* Added an HR (owner) Password (static)
* Specified the fullpath to our PDF template form
* Specified directory of our output files
* Cleaned up the look (2 digit decimals)
* Converted all numeric fields to a string
THE PYTHON BASED BATCH MACRO:
This is a somewhat universal macro. It will take in any data stream and attempt to insert those values into the correct PDF fields. Then it will flatten and encrypt before it moves onto the next record (employee).
END RESULT:
Here is a copy of the end result:
For those that want to dive into the python code, just expand the spoiler below.
#################################
__author__ = 'Peter Anderson'
__copyright__ = 'Copyright 2019, Alteryx, Inc.'
__credits__ = ['Peter Anderson']
__version__ = '2019.7.1'
__maintainer__ = 'Peter Anderson'
__email__ = 'peter.anderson@alteryx.com'
__status__ = 'Dev'
from ayx import Alteryx, Package
# Installs the Python packages that are not included in the default installation.
#Package.installPackages(['PyPDF2'])
from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, NumberObject
#################################
''' NEED APPEARANCES FLAG
Setting the NeedAppearances flag to true specifies whether to construct
appearance streams and appearance dictionaries for all widget annotations
in the document. Default value: false. This function will change that value to
to "True."
See 12.7.2 and 7.7.2 for more information:
<a href="<a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>>" target="_blank"><a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a</a>>>
'''
def set_need_appearances_writer(writer):
try:
catalog = writer._root_object
# get the AcroForm tree and add "/NeedAppearances attribute
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
''' ANNOTATION FLAG
Setting to flag to "7" will not allow the annotations to interact with the
user. The annotation may be displayed or printed (depending on the settings
of the NoView and Print flags) but should not respond to mouse clicks
or change its appearance in response to mouse motions.
See 12.5.3 for more information:
<a href="<a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>>" target="_blank"><a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a</a>>>
'''
def set_annotation_flag_writer(writer, field_dictionary):
try:
for j in range(0, len(writer.getPage(0)['/Annots'])):
writer_annot = writer.getPage(0)['/Annots'][j].getObject()
for field in field_dictionary:
if writer_annot.get('/T') == field:
writer_annot.update({
NameObject("/Ff"): NumberObject(7),
})
except Exception as e:
print('set_annotation_flag_writer() catch : ', repr(e))
return
''' FILLS THE PDF FORM
This functions performs 3 action to filling out a PDF form.
* Forces Adobe's NeedAppearances Flag to True (make values visible)
* Forces Adobe's Annotation Flag to "7" which makes the form read-only.
* Datastream values are placed as values in the PDF Form - Incoming
column names must match the ACROFORM field names.
'''
def AYX_PDF_form_fill(template, outfile, ownerPWD, userPWD):
try:
field_dictionary = df.to_dict('records')[0]
inputStream = open(template, "rb")
pdf_reader = PdfFileReader(inputStream, strict=False)
pdf_writer = PdfFileWriter()
set_need_appearances_writer(pdf_writer)
pdf_writer.addPage(pdf_reader.getPage(0))
pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(0), field_dictionary)
set_annotation_flag_writer(pdf_writer, field_dictionary)
outputStream = open(outfile, "wb")
pdf_writer.encrypt(userPWD, ownerPWD, use_128bit=True)
pdf_writer.write(outputStream)
inputStream.close()
outputStream.close()
except Exception as e:
print('AYX_PDF_form_fill() catch : ', repr(e))
return
#################################
df = Alteryx.read("#1") #Read in Datastream from Alteryx Designer Canvas
template = df.loc[0,'vestingStatementTemplate']
outfile = df.loc[0,'FileName']
ownerPWD = df.loc[0,'ownerPWD']
userPWD = df.loc[0,'userPWD']
AYX_PDF_form_fill(template, outfile, ownerPWD, userPWD)
#################################
Hi @YankeePred. Glad the macro is working out for you. To answer your multi-page PDF page question, the answer is YES.
If you look at the code snippet below it was reading in the pages of the template file and creating/adding them to the target file object. Eventually we wrote the writer object’s contents to disk.
pdf_writer.addPage(pdf_reader.getPage(page))
So its this addPage function of the PyPDF2 package does just that - adds a page to a PDF. And if your situation requires different options you also have APPEND & MERGE functions. (Append is pretty much identical to the merge method, but assumes you want to place all pages at the end of the file instead of specifying a particular position.)
from PyPDF2 import PdfFileMerger, PdfFileReader
# ...
pdf_merger = PdfFileMerger()
pdf_merger.append(PdfFileReader(file(1st_PDF_file, 'rb')))
pdf_merger.append(PdfFileReader(file(2nd_PDF_file, 'rb')))
pdf_merger.write("my_ayx_document-output.pdf")
More detail can be found at : https://pythonhosted.org/PyPDF2/PdfFileMerger.html
Hey Peter,
Thanks for the code but was just wondering, how would you combine the checkbox and text field into one code? I tried but it was not working out the way I want it to. It was just checking the box and not filling out the text field like it use to.
Thanks!
Hi Peter,
Thank you for posting this tutorial. I'm running the workflow, but the PDF form isn't populating. Is there a way I can contact you for further assistance?
Thank you,
Ngoc-Han
Hi Peter,
Many thanks for this guide. I have a question regarding the speed. When I run your workflow and create 9 PDF-forms it takes 5-6 minutes to process, i.e. not even 2 forms per minute. Are there any tweaks available to speed up the process?
Thanks in advance.
Hi Peter, Thank you for providing this tutorial. I have a pdf that has both text fields and also choice fields. I was able to use your example to create my files but when the template populates the choice fields the data doesn't flow. The text data fields flow fine. The teal fields had choice data that was supposed to flow but didn't. Those cells previously look like the other light blue choice fields that I didn't map data to yet.
Peter, your guide is great and very in-depth!
One thing though is that it doesn't seem to actually generate any output files? I can see the file paths created in the browse tool, but those files don't exist and I can see in the results pane that no files are created. What am I doing wrong, or what needs to be configured to get the files to be created?
Thanks!
This is great but I The entry point is invalid when I try to run your demo.
So amazing!! Thank you so much for the design! But we have another question - do you have any idea to update/add pdf field name using Alteryx instead of manually? In some pdfs, there are special characters in the default pdf field name, e.g. "[,]", also some pdf field name is too long. Therefore, we are wondering if any better idea to solve this problem. Also, if we want to add more pdf field name, how to do it? If we have better idea on these, we can use this tool more broadly!
Hi Peter,
This is one of my favorite workflows and it was working for me previously. I am now receiving error messages however. Do you think this has to do with the Python updates? Or is this an issue with rights my company has placed on the tool? I had someone with admin rights reinstall python a couple months ago so it should be installed on my computer
Alexandra,
Were you ever able to find the answer to this issue? I have the same problem; the workflow runs, no errors reported, but no files to be found either.
Thanks!