Engine Works

BlytheE · ‎07-31-2018

Have you ever wondered how new tools placed on the canvas know what fields are being passed to it? Are they psychic or something? Not at all! It's metadata! The metadata fields Name, Type, Size, Source, and Description are all passed from tool to tool before you even run the workflow. When working with the Python SDK, you are in charge of building and passing that information to downstream tools. This blog post will explain what that entails and how to implement it in your Python-driven plugin.

Like in most relationships, the key to passing metadata to downstream tools is communication. You must be open and honest with your output anchor. They should be made aware of any expectations so that they are prepared when other tools approach them with requests. The Python SDK provides you with the OutputAnchor and RecordInfo classes to make sure that you, your tool, and the tools around it are all on the same page...or canvas, as the case may be.

The first step is to assign the output anchor you specified in your Config.xml file to a variable in pi_init, which is where the tool and output anchors are initialized. You will need this later in order to pass the metadata structure to it. If your tool has multiple output anchors, it helps to give them descriptive names to reduce confusion later on.

In Generic_Tool_Name_v1Config.xml -

<OutputConnections>
<Connection Name="Output" AllowMultiple="False" Optional="False" Type="Connection" Label="O"/>
</OutputConnections>

In pi_init, the value of the Name attribute of the Connection element is assigned to self.output_anchor -

self.output_anchor = self.output_anchor_mgr.get_output_anchor('Output')

Metadata can be handled differently depending on where your tool falls in the workflow, but they all rely on methods found in the RecordInfo class. For intermediary or output tools that do not add fields to the data stream, you can use the clone method to create an exact copy of the record info flowing into the tool, as seen in the Python - Multiple Outputs example tool. If you are adding a field to an existing data stream, you can tack additional fields to the cloned object with add_field, like in the Python SDK Example. Or use a combination of init_from_xml and get_record_xml_data methods which can be viewed in the Python - Multiple Inputs example tool. But for input tools (tools without an incoming anchor), you need to build the metadata from scratch. In order to build the record_info_out object, I use a helper function that looks like this -

def build_record_info_out(self, response_object):
  """
  A non-interface helper for pi_push_all_records() responsible for creating the outgoing record layout.
  :param response_object: The name for the json response.
  :return: The outgoing record layout.
  """

  record_info_out = Sdk.RecordInfo(self.alteryx_engine)  # A fresh record info object for outgoing records.
  for field in response_object:
    # add_field method documentation
    # add_field((str)field_name, (FieldType)field_type, (int)size=0, (int)scale=0, (str)source='', (str)description='') -> Field
    record_info_out.add_field(field[0], field[1], field[2], field[3], 'Static Description', field[4]) 

  return record_info_out

An example of a response_object where the fields being read into Alteryx are Id, IsDeleted, and MasterRecordId looks like this -

[
    ['Id', AlteryxPythonSDK.FieldType.v_wstring, 18, 0, 'Account ID'], 
    ['IsDeleted', AlteryxPythonSDK.FieldType.bool, 1, 0, 'Deleted'], 
    ['MasterRecordId', AlteryxPythonSDK.FieldType.v_wstring, 18, 0, 'Master Record ID']
]

In the example above, I separated the metadata information from the actual data that will be pushed later. Depending on the size of your data and how much metadata you want to specify, you can combine your metadata and records, making sure to properly reference the fields. You can also use a list of dictionaries and reference the key-value pair associated with the pieces of metadata. But I would recommend wrapping it in a list to maintain the order, which should correspond to the field order of your records.

After build_record_info_out is called in pi_push_all_records, the init method in the OutputAnchor class takes the returned record_info_out object as an argument and notifies the downstream tools of the outgoing record metadata. In input tools, this is called in pi_push_all_records.

self.output_anchor.init(record_info_out)

In some tools, it is useful to include a flag which indicates whether the tool is being configured or is actually being run. In addition to pi_init, which is called any time the tool configuration changes, parts of other methods are also executed before running the tool. In an API connector, it likely makes some API calls that you would only like to happen when the tool is actually running; otherwise, the tool might error due to missing data or clicking on and off would trigger a lengthy API call and cause performance issues. The flag looks something like this -

if self.alteryx_engine.get_init_var(self.n_tool_id, 'UpdateOnly') == 'True':
            return False

and is placed in pi_push_all_records for input tools. get_init_var is a method on the AlteryxEngine class that returns the value of the global init_var specified, which is 'UpdateOnly' in this instance, from the Engine. A True value means the tool is in configuration mode, False indicates it is actually being run. The placement of this is important because if you want to be able to build a workflow without running the tool each time, you will need to populate and pass your metadata before this flag is tripped.

Knowing which classes to use in order to create and manipulate your metadata ensures that your data stays accurate and consistent from run to run. There are many ways to define your metadata depending on where your tool falls in the workflow. Hopefully, this has given you a better sense of which methods to use for your custom Python tool.

Engine Works

Managing Metadata Made Easy with the Python SDK