Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

A quick tutorial for getting Alteryx data written to an Adobe Acrobat / PDF form

PeterA
Alteryx Alumni (Retired)

 

Filling out PDF forms is often a necessary step in the analytic processes.  Here's is a sample use-case and practical guide that will help you get your analytics data into your PDF forms.  While there currently is no out-of-the-box / pre-built tool for PDF forms in 2019.2, the steps below will show how easy it is to create your own PDF form filling macro with a single python tool.

 

USE CASE / BACKGROUND:

Our PDF form filling scenario will be a workforce analytics automation scenario that is all about eliminating those mundane and time-consuming processes associated with the creation of employee vesting statements. Here we will showcase how the Alteryx Analytic platform will help streamline and control the equity decisions within an organization's guidelines while delivering transparency and auditability.

 

TYPICAL BUSINESS ENVIRONMENT & OPERATIONAL CHALLENGES:
 - Management directive to simplify and streamline the development and allocation of equity awards
 - Deliver an enhanced view of equity programs with improved levels of transparency, control and auditability across the compensation planning process.
 - Reduce the dependency on legacy mail merge approaches that have proved too time-consuming and lacked the requisite level of control.

 

 

The PDF Form - Employee Vesting Template:

In this scenario our HR / employee vesting template was created in Microsoft Word & Adobe Acrobat DC and adheres to the 1.7 PDF standard.

PDF Form / Template for Vesting StatementsPDF Form / Template for Vesting Statements

 

 

Preparing the Form:

An important step is making sure that the Alteryx Field/Column name *IS EQUIVALENT*  to the form field name in the Adobe PDF template.

Other notable items:

  * Field order does not matter

  * Visibility Flag is normally set at True for these types of forms.

  * Read-only Flag is Optional.  Our python code will set this flag in the PDF and will effectively eliminate the 'blue boxes'

PDF Form Field PropertiesPDF Form Field Properties

 

 

Preparing the Workflow:

As mentioned above, Alteryx Field/Column name must match the form field name in the Adobe PDF template.  In the below/attached workflow we will alter the data stream to match our form, by creating a new field call fullName.  Here are a couple of additional changes we made along the way:

  * Added an Employee Password (dynamic)

  * Added an HR (owner) Password (static)

  * Specified the fullpath to our PDF template form

  * Specified directory of our output files

  * Cleaned up the look (2 digit decimals)

  * Converted all numeric fields to a string

Main Workflow - Workforce AnalyticsMain Workflow - Workforce Analytics

 

 

 

THE PYTHON BASED BATCH MACRO: 

This is a somewhat universal macro.  It will take in any data stream and attempt to insert those values into the correct PDF fields.  Then it will flatten and encrypt before it moves onto the next record (employee).

Python based batch macro for filling PDF formsPython based batch macro for filling PDF forms

END RESULT:

Here is a copy of the end result:

 

PDF Form filled by AlteryxPDF Form filled by Alteryx

 

 

For those that want to dive into the python code, just expand the spoiler below.

Spoiler
#################################
__author__ = 'Peter Anderson'
__copyright__ = 'Copyright 2019, Alteryx, Inc.'
__credits__ = ['Peter Anderson']
__version__ = '2019.7.1'
__maintainer__ = 'Peter Anderson'
__email__ = 'peter.anderson@alteryx.com'
__status__ = 'Dev'


from ayx import Alteryx, Package
# Installs the Python packages that are not included in the default installation.  
#Package.installPackages(['PyPDF2']) 

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, NumberObject


#################################
''' NEED APPEARANCES FLAG
Setting the NeedAppearances flag to true specifies whether to construct 
appearance streams and appearance dictionaries for all widget annotations
in the document. Default value: false.  This function will change that value  to
to "True."

See 12.7.2 and 7.7.2 for more information:
<a href="<a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>>" target="_blank"><a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a</a>>>
'''
def set_need_appearances_writer(writer):
    try:
        catalog = writer._root_object
        # get the AcroForm tree and add "/NeedAppearances attribute
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer
    
    
''' ANNOTATION FLAG
Setting to flag to "7" will not allow the annotations to interact with the
user.  The annotation may be displayed or printed (depending on the settings
of the NoView and Print flags) but should not respond to mouse clicks
or change its appearance in response to mouse motions. 

See 12.5.3 for more information:
<a href="<a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>>" target="_blank"><a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a</a>>>
'''
def set_annotation_flag_writer(writer, field_dictionary):
    try:
        for j in range(0, len(writer.getPage(0)['/Annots'])):
            writer_annot = writer.getPage(0)['/Annots'][j].getObject()
            for field in field_dictionary:
                if writer_annot.get('/T') == field:
                    writer_annot.update({
                        NameObject("/Ff"): NumberObject(7),
                    })
    except Exception as e:
        print('set_annotation_flag_writer() catch : ', repr(e))
        return


''' FILLS THE PDF FORM
This functions performs 3 action to filling out a PDF form.
* Forces Adobe's NeedAppearances Flag to True (make values visible)
* Forces Adobe's Annotation Flag to "7" which makes the form read-only.
* Datastream values are placed as values in the PDF Form - Incoming 
  column names must match the ACROFORM field names.
'''    
def AYX_PDF_form_fill(template, outfile, ownerPWD, userPWD):
    try:
                
        field_dictionary = df.to_dict('records')[0]
        inputStream = open(template, "rb")
        pdf_reader = PdfFileReader(inputStream, strict=False)

        pdf_writer = PdfFileWriter()
        set_need_appearances_writer(pdf_writer)

        pdf_writer.addPage(pdf_reader.getPage(0))
        pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(0), field_dictionary)

        set_annotation_flag_writer(pdf_writer, field_dictionary)
        
        outputStream = open(outfile, "wb")
        pdf_writer.encrypt(userPWD, ownerPWD, use_128bit=True)
        pdf_writer.write(outputStream)

        inputStream.close()
        outputStream.close()
        
    except Exception as e:
        print('AYX_PDF_form_fill() catch : ', repr(e))
        return
    
    




#################################
df = Alteryx.read("#1")        #Read in Datastream from Alteryx Designer Canvas

template = df.loc[0,'vestingStatementTemplate']
outfile = df.loc[0,'FileName']
ownerPWD = df.loc[0,'ownerPWD']
userPWD = df.loc[0,'userPWD']

AYX_PDF_form_fill(template, outfile, ownerPWD, userPWD)



#################################

 

29 REPLIES 29
jonblev
6 - Meteoroid

Thank you for this tutorial!

 

Two quick questions:

1. I am getting the following error:

 

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-a986115e7ebe> in <module>
     12 #Package.installPackages(['PyPDF2'])
     13 
---> 14 from PyPDF2 import PdfFileWriter, PdfFileReader
     15 from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, NumberObject

ModuleNotFoundError: No module named 'PyPDF2'

 I did open Designer "Run as Admin" before running the workflow. I don't have much python experience so I'm unsure how to proceed.

 

2. Do I need to configure the Marco Input tool that's bundled in the macro? It's current Input Name is "vesting_data".

 

Thanks for your time!

PeterA
Alteryx Alumni (Retired)

Hi @jonblev ,

Thanks for the quick note.  It looks like you do not have the python package PyPDF2 installed.  This is a required step and something I should have called that out in the original post.  To install this python package you have 3 options.

 

Option 1:

Run Alteryx Designer in elevated mode / as an admin

then from Python tool:

 

from ayx import Alteryx
Package.installPackages(['PyPDF2'])

 

 

 

Option 2:

You'll also see this install command on Line 12 - just commented out.  You could also remove the '#' and run this workflow in elevated mode (at least once) to install that package.

Source: https://community.alteryx.com/t5/Alteryx-Knowledge-Base/How-To-Use-Alteryx-installPackages-in-Python...

 

 

Option 3:

 

  1. Open Command Prompt as administrator (not necessary for Alteryx Designer non-admin)
  2. Enter the following:

For Designer admin:

 

cd "%PROGRAMFILES%\Alteryx\bin\Miniconda3\PythonTool_venv\Scripts"
pip install PyPDF2

 

 

For Designer non-admin:

 

cd "%LOCALAPPDATA%\Alteryx\bin\Miniconda3\PythonTool_venv\Scripts"
pip install PyPDF2

 

jonblev
6 - Meteoroid

Thank you, I did option 1 and it appears to have worked. However I now have the following error:

 

Error: vesting_statement_helper_macro (7): Record #1: Tool #5: ---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2656 try:
-> 2657 return self._engine.get_loc(key)
2658 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'FileName'

During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-3-8715d294d885> in <module>
2
3 template = df.loc[0,'vestingStatementTemplate']
----> 4 outfile = df.loc[0,'FileName']
5 ownerPWD = df.loc[0,'ownerPWD']
6 userPWD = df.loc[0,'userPWD']
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1492 except (KeyError, IndexError, AttributeError):
1493 pass
-> 1494 return self._getitem_tuple(key)
1495 else:
1496 # we by definition only have the 0th axis
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
866 def _getitem_tuple(self, tup):
867 try:
--> 868 return self._getitem_lowerdim(tup)
869 except IndexingError:
870 pass
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _getitem_lowerdim(self, tup)
1015 return section
1016 # This is an elided recursive call to iloc/loc/etc'
-> 1017 return getattr(section, self.name)[new_key]
1018
1019 raise IndexingError('not applicable')
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1498
1499 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1500 return self._getitem_axis(maybe_callable, axis=axis)
1501
1502 def _is_scalar_access(self, key):
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
1911 # fall thru to straight lookup
1912 self._validate_key(key, axis)
-> 1913 return self._get_label(key, axis=axis)
1914
1915
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
135 # but will fail when the index is not present
136 # see GH5667
--> 137 return self.obj._xs(label, axis=axis)
138 elif isinstance(label, tuple) and isinstance(label[axis], slice):
139 raise IndexingError('no slices here, handle elsewhere')
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
3583 drop_level=drop_level)
3584 else:
-> 3585 loc = self.index.get_loc(key)
3586
3587 if isinstance(loc, np.ndarray):
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2657 return self._engine.get_loc(key)
2658 except KeyError:
-> 2659 return self._engine.get_loc(self._maybe_cast_indexer(key))
2660 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2661 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'FileName'

 

Thanks again for you help on this!

jonblev
6 - Meteoroid

Thank you! Option 1 appears to have worked. However I now get the following error:

 

Error: vesting_statement_helper_macro (7): Record #1: Tool #5: ---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2656             try:
-> 2657                 return self._engine.get_loc(key)
   2658             except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'FileName'

During handling of the above exception, another exception occurred:
KeyError                                  Traceback (most recent call last)
<ipython-input-3-8715d294d885> in <module>
      2 
      3 template = df.loc[0,'vestingStatementTemplate']
----> 4 outfile = df.loc[0,'FileName']
      5 ownerPWD = df.loc[0,'ownerPWD']
      6 userPWD = df.loc[0,'userPWD']
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1492             except (KeyError, IndexError, AttributeError):
   1493                 pass
-> 1494             return self._getitem_tuple(key)
   1495         else:
   1496             # we by definition only have the 0th axis
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
    866     def _getitem_tuple(self, tup):
    867         try:
--> 868             return self._getitem_lowerdim(tup)
    869         except IndexingError:
    870             pass
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _getitem_lowerdim(self, tup)
   1015                     return section
   1016                 # This is an elided recursive call to iloc/loc/etc'
-> 1017                 return getattr(section, self.name)[new_key]
   1018 
   1019         raise IndexingError('not applicable')
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1498 
   1499             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1500             return self._getitem_axis(maybe_callable, axis=axis)
   1501 
   1502     def _is_scalar_access(self, key):
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1911         # fall thru to straight lookup
   1912         self._validate_key(key, axis)
-> 1913         return self._get_label(key, axis=axis)
   1914 
   1915 
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
    135             # but will fail when the index is not present
    136             # see GH5667
--> 137             return self.obj._xs(label, axis=axis)
    138         elif isinstance(label, tuple) and isinstance(label[axis], slice):
    139             raise IndexingError('no slices here, handle elsewhere')
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
   3583                                                       drop_level=drop_level)
   3584         else:
-> 3585             loc = self.index.get_loc(key)
   3586 
   3587             if isinstance(loc, np.ndarray):
c:\program files\alteryx\bin\miniconda3\pythontool_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2657                 return self._engine.get_loc(key)
   2658             except KeyError:
-> 2659                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2660         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2661         if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'FileName'

 

Thanks again for your help with this!

PeterA
Alteryx Alumni (Retired)

Hi @jonblev 

Good Catch.  Must have uploaded a previous version by accident. The error you see is from Alteryx Designer passing a field name labeled "filename" and where the Python Tool is expecting a field name of "FileName".  Its a case sensitivity issue.  A simple update in the Python Tool (see below) to force lowercase will fix the issue, but I will attach an update to this post for future community users.

 

FileName to filenameFileName to filename

jonblev
6 - Meteoroid

Thank you! This worked. Sorry for the double post above, didn't intend to do that. 

 

One other question, is this limited to 1 page PDFs? I just tried to run my workflow with 3 page PDF template and it only created the 1st page.

 

Thanks!

PeterA
Alteryx Alumni (Retired)

Yes.  It can work for multiple pages, merging multiple PDF documents into one or splitting them up.  You can explore more of the pypdf2 documentation online at https://pythonhosted.org/PyPDF2/

 

In the code posted above we were just focused on a single page template

        pdf_writer.addPage(pdf_reader.getPage(0))
        pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(0), field_dictionary)

 

but for something a bit more universal you first find the number of pages in your template and then loop through adding them to your target PDF file.  Something like the following:

        for page in range(pdf_reader.getNumPages()):
            pdf_writer.addPage(pdf_reader.getPage(page))
            pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(page), field_dictionary)

 

PeterA
Alteryx Alumni (Retired)

PDF Checkboxes

 

Had an interesting DM / inquiry about filling out check boxes in PDF forms.  Interestingly there are some details in the Adobe PDF Spec that outlines what would need to be done. See the 1.7 Spec at https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

 

Snippet from PDF 1.7 SpecSnippet from PDF 1.7 Spec

 

here is some sample code that works

...
                if writer_annot.get('/T') == field:
                    writer_annot.update({
                        NameObject("/V"): NameObject(field_dictionary[field]),
                        NameObject("/AS"): NameObject(field_dictionary[field]),
                        ...

 

Now the important part in your workflow is that to set that checkbox you will need to specify the string value as "/Yes"

 

checkbox set to CHECKEDcheckbox set to CHECKED

 

Hope this helps.

YankeePred
5 - Atom

Thanks so much for posting this macro.  This can be an amazing tool for several published state forms are organization is required to complete.  We have been able to successfully use this macro to build a proof of concept with one of these state forms.  

 

With that said, would there be a possibility of using the same Python functionality but complete multiple records not as separate files but as one multi-page PDF?  

Labels