Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

A quick tutorial for getting Alteryx data written to an Adobe Acrobat / PDF form

PeterA
Alteryx Alumni (Retired)

 

Filling out PDF forms is often a necessary step in the analytic processes.  Here's is a sample use-case and practical guide that will help you get your analytics data into your PDF forms.  While there currently is no out-of-the-box / pre-built tool for PDF forms in 2019.2, the steps below will show how easy it is to create your own PDF form filling macro with a single python tool.

 

USE CASE / BACKGROUND:

Our PDF form filling scenario will be a workforce analytics automation scenario that is all about eliminating those mundane and time-consuming processes associated with the creation of employee vesting statements. Here we will showcase how the Alteryx Analytic platform will help streamline and control the equity decisions within an organization's guidelines while delivering transparency and auditability.

 

TYPICAL BUSINESS ENVIRONMENT & OPERATIONAL CHALLENGES:
 - Management directive to simplify and streamline the development and allocation of equity awards
 - Deliver an enhanced view of equity programs with improved levels of transparency, control and auditability across the compensation planning process.
 - Reduce the dependency on legacy mail merge approaches that have proved too time-consuming and lacked the requisite level of control.

 

 

The PDF Form - Employee Vesting Template:

In this scenario our HR / employee vesting template was created in Microsoft Word & Adobe Acrobat DC and adheres to the 1.7 PDF standard.

PDF Form / Template for Vesting StatementsPDF Form / Template for Vesting Statements

 

 

Preparing the Form:

An important step is making sure that the Alteryx Field/Column name *IS EQUIVALENT*  to the form field name in the Adobe PDF template.

Other notable items:

  * Field order does not matter

  * Visibility Flag is normally set at True for these types of forms.

  * Read-only Flag is Optional.  Our python code will set this flag in the PDF and will effectively eliminate the 'blue boxes'

PDF Form Field PropertiesPDF Form Field Properties

 

 

Preparing the Workflow:

As mentioned above, Alteryx Field/Column name must match the form field name in the Adobe PDF template.  In the below/attached workflow we will alter the data stream to match our form, by creating a new field call fullName.  Here are a couple of additional changes we made along the way:

  * Added an Employee Password (dynamic)

  * Added an HR (owner) Password (static)

  * Specified the fullpath to our PDF template form

  * Specified directory of our output files

  * Cleaned up the look (2 digit decimals)

  * Converted all numeric fields to a string

Main Workflow - Workforce AnalyticsMain Workflow - Workforce Analytics

 

 

 

THE PYTHON BASED BATCH MACRO: 

This is a somewhat universal macro.  It will take in any data stream and attempt to insert those values into the correct PDF fields.  Then it will flatten and encrypt before it moves onto the next record (employee).

Python based batch macro for filling PDF formsPython based batch macro for filling PDF forms

END RESULT:

Here is a copy of the end result:

 

PDF Form filled by AlteryxPDF Form filled by Alteryx

 

 

For those that want to dive into the python code, just expand the spoiler below.

Spoiler
#################################
__author__ = 'Peter Anderson'
__copyright__ = 'Copyright 2019, Alteryx, Inc.'
__credits__ = ['Peter Anderson']
__version__ = '2019.7.1'
__maintainer__ = 'Peter Anderson'
__email__ = 'peter.anderson@alteryx.com'
__status__ = 'Dev'


from ayx import Alteryx, Package
# Installs the Python packages that are not included in the default installation.  
#Package.installPackages(['PyPDF2']) 

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, NumberObject


#################################
''' NEED APPEARANCES FLAG
Setting the NeedAppearances flag to true specifies whether to construct 
appearance streams and appearance dictionaries for all widget annotations
in the document. Default value: false.  This function will change that value  to
to "True."

See 12.7.2 and 7.7.2 for more information:
<a href="<a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>>" target="_blank"><a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a</a>>>
'''
def set_need_appearances_writer(writer):
    try:
        catalog = writer._root_object
        # get the AcroForm tree and add "/NeedAppearances attribute
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer
    
    
''' ANNOTATION FLAG
Setting to flag to "7" will not allow the annotations to interact with the
user.  The annotation may be displayed or printed (depending on the settings
of the NoView and Print flags) but should not respond to mouse clicks
or change its appearance in response to mouse motions. 

See 12.5.3 for more information:
<a href="<a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>>" target="_blank"><a href="<a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a>" target="_blank"><a href="http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a" target="_blank">http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf</a</a</a>>>
'''
def set_annotation_flag_writer(writer, field_dictionary):
    try:
        for j in range(0, len(writer.getPage(0)['/Annots'])):
            writer_annot = writer.getPage(0)['/Annots'][j].getObject()
            for field in field_dictionary:
                if writer_annot.get('/T') == field:
                    writer_annot.update({
                        NameObject("/Ff"): NumberObject(7),
                    })
    except Exception as e:
        print('set_annotation_flag_writer() catch : ', repr(e))
        return


''' FILLS THE PDF FORM
This functions performs 3 action to filling out a PDF form.
* Forces Adobe's NeedAppearances Flag to True (make values visible)
* Forces Adobe's Annotation Flag to "7" which makes the form read-only.
* Datastream values are placed as values in the PDF Form - Incoming 
  column names must match the ACROFORM field names.
'''    
def AYX_PDF_form_fill(template, outfile, ownerPWD, userPWD):
    try:
                
        field_dictionary = df.to_dict('records')[0]
        inputStream = open(template, "rb")
        pdf_reader = PdfFileReader(inputStream, strict=False)

        pdf_writer = PdfFileWriter()
        set_need_appearances_writer(pdf_writer)

        pdf_writer.addPage(pdf_reader.getPage(0))
        pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(0), field_dictionary)

        set_annotation_flag_writer(pdf_writer, field_dictionary)
        
        outputStream = open(outfile, "wb")
        pdf_writer.encrypt(userPWD, ownerPWD, use_128bit=True)
        pdf_writer.write(outputStream)

        inputStream.close()
        outputStream.close()
        
    except Exception as e:
        print('AYX_PDF_form_fill() catch : ', repr(e))
        return
    
    




#################################
df = Alteryx.read("#1")        #Read in Datastream from Alteryx Designer Canvas

template = df.loc[0,'vestingStatementTemplate']
outfile = df.loc[0,'FileName']
ownerPWD = df.loc[0,'ownerPWD']
userPWD = df.loc[0,'userPWD']

AYX_PDF_form_fill(template, outfile, ownerPWD, userPWD)



#################################

 

36 REPLIES 36
fakbarli
5 - Atom

@hcook you have to flatten them, in other words you have to print pdfs to pdf again.

hcook
6 - Meteoroid

@fakbarli I tried this and it unfortunately still doesn't work. The only thing that gets the data to show up on the PDF is if I open it from Microsoft Edge's PDF viewer and not Adobe's.

CloudyPez
5 - Atom

@hcook what worked for me on Designer 2023.2 was commenting out the function call “set_need_appearances_writer(pdf_writer)” and replacing it with the below. 

 

trailer = pdf_reader.trailer["/Root"]["/AcroForm"]

        pdf_writer._root_object.update({

                NameObject('/AcroForm'): trailer

            })

hcook
6 - Meteoroid

@CloudyPez Thanks! What Python package are you using? PyPDF?

CloudyPez
5 - Atom

@hcook PyPDF2 the one used in this example. Thx

hcook
6 - Meteoroid

@CloudyPez I thought that package was now deprecated. Awhile back, I had to use some other PDF package of some sorts. I'll have to get back into the weeds of the workflow I made and see if I can get it to work.

CloudyPez
5 - Atom

@hcook Ah yes I had to also update the deprecated object names to the newly supported ones. I believe when you run it it tells you which ones are deprecated. i.e. PdfFileReader is now PdfReader, etc.

Labels
Top Solution Authors