Filling out PDF forms is often a necessary step in the analytic processes. Here's is a sample use-case and practical guide that will help you get your analytics data into your PDF forms. While there currently is no out-of-the-box / pre-built tool for PDF forms in 2019.2, the steps below will show how easy it is to create your own PDF form filling macro with a single python tool.
USE CASE / BACKGROUND:
Our PDF form filling scenario will be a workforce analytics automation scenario that is all about eliminating those mundane and time-consuming processes associated with the creation of employee vesting statements. Here we will showcase how the Alteryx Analytic platform will help streamline and control the equity decisions within an organization's guidelines while delivering transparency and auditability.
TYPICAL BUSINESS ENVIRONMENT & OPERATIONAL CHALLENGES:
- Management directive to simplify and streamline the development and allocation of equity awards
- Deliver an enhanced view of equity programs with improved levels of transparency, control and auditability across the compensation planning process.
- Reduce the dependency on legacy mail merge approaches that have proved too time-consuming and lacked the requisite level of control.
The PDF Form - Employee Vesting Template:
In this scenario our HR / employee vesting template was created in Microsoft Word & Adobe Acrobat DC and adheres to the 1.7 PDF standard.
Preparing the Form:
An important step is making sure that the Alteryx Field/Column name *IS EQUIVALENT* to the form field name in the Adobe PDF template.
Other notable items:
* Field order does not matter
* Visibility Flag is normally set at True for these types of forms.
* Read-only Flag is Optional. Our python code will set this flag in the PDF and will effectively eliminate the 'blue boxes'
Preparing the Workflow:
As mentioned above, Alteryx Field/Column name must match the form field name in the Adobe PDF template. In the below/attached workflow we will alter the data stream to match our form, by creating a new field call fullName. Here are a couple of additional changes we made along the way:
* Added an Employee Password (dynamic)
* Added an HR (owner) Password (static)
* Specified the fullpath to our PDF template form
* Specified directory of our output files
* Cleaned up the look (2 digit decimals)
* Converted all numeric fields to a string
THE PYTHON BASED BATCH MACRO:
This is a somewhat universal macro. It will take in any data stream and attempt to insert those values into the correct PDF fields. Then it will flatten and encrypt before it moves onto the next record (employee).
END RESULT:
Here is a copy of the end result:
For those that want to dive into the python code, just expand the spoiler below.
#################################
__author__ = 'Peter Anderson'
__copyright__ = 'Copyright 2019, Alteryx, Inc.'
__credits__ = ['Peter Anderson']
__version__ = '2019.7.1'
__maintainer__ = 'Peter Anderson'
__email__ = 'peter.anderson@alteryx.com'
__status__ = 'Dev'
from ayx import Alteryx, Package
# Installs the Python packages that are not included in the default installation.
#Package.installPackages(['PyPDF2'])
from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, NumberObject
#################################
''' NEED APPEARANCES FLAG
Setting the NeedAppearances flag to true specifies whether to construct
appearance streams and appearance dictionaries for all widget annotations
in the document. Default value: false. This function will change that value to
to "True."
See 12.7.2 and 7.7.2 for more information:
<a href="<a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>>" target="_blank"><a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a</a>>>
'''
def set_need_appearances_writer(writer):
try:
catalog = writer._root_object
# get the AcroForm tree and add "/NeedAppearances attribute
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
''' ANNOTATION FLAG
Setting to flag to "7" will not allow the annotations to interact with the
user. The annotation may be displayed or printed (depending on the settings
of the NoView and Print flags) but should not respond to mouse clicks
or change its appearance in response to mouse motions.
See 12.5.3 for more information:
<a href="<a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>>" target="_blank"><a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a</a>>>
'''
def set_annotation_flag_writer(writer, field_dictionary):
try:
for j in range(0, len(writer.getPage(0)['/Annots'])):
writer_annot = writer.getPage(0)['/Annots'][j].getObject()
for field in field_dictionary:
if writer_annot.get('/T') == field:
writer_annot.update({
NameObject("/Ff"): NumberObject(7),
})
except Exception as e:
print('set_annotation_flag_writer() catch : ', repr(e))
return
''' FILLS THE PDF FORM
This functions performs 3 action to filling out a PDF form.
* Forces Adobe's NeedAppearances Flag to True (make values visible)
* Forces Adobe's Annotation Flag to "7" which makes the form read-only.
* Datastream values are placed as values in the PDF Form - Incoming
column names must match the ACROFORM field names.
'''
def AYX_PDF_form_fill(template, outfile, ownerPWD, userPWD):
try:
field_dictionary = df.to_dict('records')[0]
inputStream = open(template, "rb")
pdf_reader = PdfFileReader(inputStream, strict=False)
pdf_writer = PdfFileWriter()
set_need_appearances_writer(pdf_writer)
pdf_writer.addPage(pdf_reader.getPage(0))
pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(0), field_dictionary)
set_annotation_flag_writer(pdf_writer, field_dictionary)
outputStream = open(outfile, "wb")
pdf_writer.encrypt(userPWD, ownerPWD, use_128bit=True)
pdf_writer.write(outputStream)
inputStream.close()
outputStream.close()
except Exception as e:
print('AYX_PDF_form_fill() catch : ', repr(e))
return
#################################
df = Alteryx.read("#1") #Read in Datastream from Alteryx Designer Canvas
template = df.loc[0,'vestingStatementTemplate']
outfile = df.loc[0,'FileName']
ownerPWD = df.loc[0,'ownerPWD']
userPWD = df.loc[0,'userPWD']
AYX_PDF_form_fill(template, outfile, ownerPWD, userPWD)
#################################
Thank you for this tutorial!
Two quick questions:
1. I am getting the following error:
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) <ipython-input-1-a986115e7ebe> in <module> 12 #Package.installPackages(['PyPDF2']) 13 ---> 14 from PyPDF2 import PdfFileWriter, PdfFileReader 15 from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, NumberObject ModuleNotFoundError: No module named 'PyPDF2'
I did open Designer "Run as Admin" before running the workflow. I don't have much python experience so I'm unsure how to proceed.
2. Do I need to configure the Marco Input tool that's bundled in the macro? It's current Input Name is "vesting_data".
Thanks for your time!
Hi @jonblev ,
Thanks for the quick note. It looks like you do not have the python package PyPDF2 installed. This is a required step and something I should have called that out in the original post. To install this python package you have 3 options.
Option 1:
Run Alteryx Designer in elevated mode / as an admin
then from Python tool:
from ayx import Alteryx
Package.installPackages(['PyPDF2'])
Option 2:
You'll also see this install command on Line 12 - just commented out. You could also remove the '#' and run this workflow in elevated mode (at least once) to install that package.
Option 3:
For Designer admin:
cd "%PROGRAMFILES%\Alteryx\bin\Miniconda3\PythonTool_venv\Scripts"
pip install PyPDF2
For Designer non-admin:
cd "%LOCALAPPDATA%\Alteryx\bin\Miniconda3\PythonTool_venv\Scripts"
pip install PyPDF2
Thank you, I did option 1 and it appears to have worked. However I now have the following error:
Error: vesting_statement_helper_macro (7): Record #1: Tool #5: ---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2656 try:
-> 2657 return self._engine.get_loc(key)
2658 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'FileName'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-3-8715d294d885> in <module>
2
3 template = df.loc[0,'vestingStatementTemplate']
----> 4 outfile = df.loc[0,'FileName']
5 ownerPWD = df.loc[0,'ownerPWD']
6 userPWD = df.loc[0,'userPWD']
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1492 except (KeyError, IndexError, AttributeError):
1493 pass
-> 1494 return self._getitem_tuple(key)
1495 else:
1496 # we by definition only have the 0th axis
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
866 def _getitem_tuple(self, tup):
867 try:
--> 868 return self._getitem_lowerdim(tup)
869 except IndexingError:
870 pass
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _getitem_lowerdim(self, tup)
1015 return section
1016 # This is an elided recursive call to iloc/loc/etc'
-> 1017 return getattr(section, self.name)[new_key]
1018
1019 raise IndexingError('not applicable')
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1498
1499 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1500 return self._getitem_axis(maybe_callable, axis=axis)
1501
1502 def _is_scalar_access(self, key):
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
1911 # fall thru to straight lookup
1912 self._validate_key(key, axis)
-> 1913 return self._get_label(key, axis=axis)
1914
1915
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
135 # but will fail when the index is not present
136 # see GH5667
--> 137 return self.obj._xs(label, axis=axis)
138 elif isinstance(label, tuple) and isinstance(label[axis], slice):
139 raise IndexingError('no slices here, handle elsewhere')
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
3583 drop_level=drop_level)
3584 else:
-> 3585 loc = self.index.get_loc(key)
3586
3587 if isinstance(loc, np.ndarray):
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2657 return self._engine.get_loc(key)
2658 except KeyError:
-> 2659 return self._engine.get_loc(self._maybe_cast_indexer(key))
2660 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2661 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'FileName'
Thanks again for you help on this!
Thank you! Option 1 appears to have worked. However I now get the following error:
Error: vesting_statement_helper_macro (7): Record #1: Tool #5: ---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2656 try:
-> 2657 return self._engine.get_loc(key)
2658 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'FileName'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-3-8715d294d885> in <module>
2
3 template = df.loc[0,'vestingStatementTemplate']
----> 4 outfile = df.loc[0,'FileName']
5 ownerPWD = df.loc[0,'ownerPWD']
6 userPWD = df.loc[0,'userPWD']
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1492 except (KeyError, IndexError, AttributeError):
1493 pass
-> 1494 return self._getitem_tuple(key)
1495 else:
1496 # we by definition only have the 0th axis
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
866 def _getitem_tuple(self, tup):
867 try:
--> 868 return self._getitem_lowerdim(tup)
869 except IndexingError:
870 pass
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _getitem_lowerdim(self, tup)
1015 return section
1016 # This is an elided recursive call to iloc/loc/etc'
-> 1017 return getattr(section, self.name)[new_key]
1018
1019 raise IndexingError('not applicable')
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1498
1499 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1500 return self._getitem_axis(maybe_callable, axis=axis)
1501
1502 def _is_scalar_access(self, key):
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
1911 # fall thru to straight lookup
1912 self._validate_key(key, axis)
-> 1913 return self._get_label(key, axis=axis)
1914
1915
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
135 # but will fail when the index is not present
136 # see GH5667
--> 137 return self.obj._xs(label, axis=axis)
138 elif isinstance(label, tuple) and isinstance(label[axis], slice):
139 raise IndexingError('no slices here, handle elsewhere')
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
3583 drop_level=drop_level)
3584 else:
-> 3585 loc = self.index.get_loc(key)
3586
3587 if isinstance(loc, np.ndarray):
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2657 return self._engine.get_loc(key)
2658 except KeyError:
-> 2659 return self._engine.get_loc(self._maybe_cast_indexer(key))
2660 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2661 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'FileName'
Thanks again for your help with this!
Hi @jonblev
Good Catch. Must have uploaded a previous version by accident. The error you see is from Alteryx Designer passing a field name labeled "filename" and where the Python Tool is expecting a field name of "FileName". Its a case sensitivity issue. A simple update in the Python Tool (see below) to force lowercase will fix the issue, but I will attach an update to this post for future community users.
Thank you! This worked. Sorry for the double post above, didn't intend to do that.
One other question, is this limited to 1 page PDFs? I just tried to run my workflow with 3 page PDF template and it only created the 1st page.
Thanks!
Yes. It can work for multiple pages, merging multiple PDF documents into one or splitting them up. You can explore more of the pypdf2 documentation online at https://pythonhosted.org/PyPDF2/
In the code posted above we were just focused on a single page template
pdf_writer.addPage(pdf_reader.getPage(0))
pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(0), field_dictionary)
but for something a bit more universal you first find the number of pages in your template and then loop through adding them to your target PDF file. Something like the following:
for page in range(pdf_reader.getNumPages()):
pdf_writer.addPage(pdf_reader.getPage(page))
pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(page), field_dictionary)
PDF Checkboxes
Had an interesting DM / inquiry about filling out check boxes in PDF forms. Interestingly there are some details in the Adobe PDF Spec that outlines what would need to be done. See the 1.7 Spec at https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
here is some sample code that works
...
if writer_annot.get('/T') == field:
writer_annot.update({
NameObject("/V"): NameObject(field_dictionary[field]),
NameObject("/AS"): NameObject(field_dictionary[field]),
...
Now the important part in your workflow is that to set that checkbox you will need to specify the string value as "/Yes"
Hope this helps.
Thanks so much for posting this macro. This can be an amazing tool for several published state forms are organization is required to complete. We have been able to successfully use this macro to build a proof of concept with one of these state forms.
With that said, would there be a possibility of using the same Python functionality but complete multiple records not as separate files but as one multi-page PDF?
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
6 |