Upgrading from Python Engine SDK to Python SDK V2
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Notify Moderator
04-17-2024 01:31 PM - edited 10-29-2024 11:16 AM
Introduction
Welcome to the Python SDK v2 migration guide! You will find this guide useful if you are considering upgrading to the latest Python SDK (v2 at the time of writing). As you begin your upgrade, here are a few high-level points to keep in mind:
-
As part of the continued development of Python SDK v2, we consider the development experience as much a priority as the features it provides. Given this, you should find the migration experience not only straightforward but exciting as you discover more about Python SDK v2.
-
This migration can (and should) result in far less boilerplate code, a smoother development experience, easier-to-maintain packaging, and more (as we describe below).
-
We expect feature completeness relative to prior SDKs. As long as a feature has not been deprecated in a Designer general release[GA](see: dcm deprecation and dcme release in GA), you can expect all the same functionality as the Python Engine SDK. If you feel something is missing, check the docs or ask in the Alteryx IO Community!
Given this, approach this migration as a refactor to a Pythonic implementation of a Python Engine SDK tool, as that is the intended development experience of the Python SDK v2 developer. You might find you no longer need to do a large portion of what was previously required to package, ship, or develop your tool. Python SDK v2 lets you focus on implementation details while providing escape hatches where necessary. Additionally, as more features are added you will have access to current and continuously updated documentation. This also enables your tool for release on the Alteryx Marketplace!
Through this guide, we advise the best approach to migration—the most common steps, how to take them, and the resources you might need during the migration. We are excited to see what you can build with our enhanced functionalities.
Setting Up The Development Environment
As experienced Python Engine SDK readers might recall, Python Engine SDK development requires extensive manual packaging, path updates, and more. In our experience assisting with tool migrations, we have found this resulted in Python Engine SDK tools with disparate build processes between teams and developers.
Python SDK v2 automates the process of YXI
creation. Your existing front-end assets remain detached from the back end, and your tool code now behaves as a Python module run by embedded Python shipped with Alteryx Designer (pathing for execution now managed under the hood). What does this mean?
This means that preparing your environment to use the SDK is as simple as shown in our Quick Start Guide.
-
[ ] Set Up a Python Virtual Environment
-
[ ] Install Python SDK V2
-
[ ] Initialize Your Workspace
Now you have what will compose the root of your YXI, respective to the Python Engine SDK manual process. You should have a backend
folder and other locations that the Python SDK v2 create-yxi
and designer-install
methods eventually automatically bundle for you into a YXI.
Then, rather than the Python Engine SDK method where you manually copy example code, you can automate this process as well via another Python SDK v2, or as defined in the Python development environment ayx_plugin_cli
command create-plugin
.
Next, pick the tool type that matches your existing tool. You might want to review the Tool Config File to inform this decision! Note that you will find a similar tool config file that is generated for your new tool module. This follows the same schema as other Designer tool config XML.
Warning: We optimize and carefully consider the configuration in this file. As a result, we don’t recommend manual edits to the file.
To summarize, follow our quick start guide and use the CLI to generate the appropriate template for your plugins!
Migrating Dependencies
To migrate your dependencies, run pip freeze > requirement-migration.txt
in your respective Python Engine SDK development environment. Ensure no reference to Python Engine SDK itself is included in the output.
Then, deactivate the current Python Engine SDK environment using the appropriate command for your Python virtualenv
manager. Using Python’s own venv
module for this guide, we use the deactivate
command. (Go to Virtual Environments and Packages for more details on Python’s virtual env).
Next, activate your Python SDK V2
environment and run pip install -r requirement-migration.txt
. This installs your previous requirements into your current environment for development use and (automated) YXI creation!
Migrating Shared Code (Python Modules)
To migrate your shared code or other Python modules, you can include them alongside other packages in backend
. Then, reference them as you would any other Python module in your environment. Be sure to remove/redefine references to Python Engine SDK Dependencies or Classes!
Refactoring/Updating for the Python SDK V2 API
Now that you can generate templated tool source code in a Python virtual environment with an ayx_plugin_cli
workspace, we consider approaches to implementing features and functionality within the API Python SDK v2 providers.
Where in Python Engine SDK there was significant boilerplate required, Python SDK v2 requires only a few API method implementations. The SDK v2 generates default values for connections and data. It handles the majority of boilerplate for connection and Record
setup that you had to do manually in Python Engine SDK. That being said, you can get under-the-hood access when you need it with additional utilities and features within the ayx_python_sdk
codebase.
Remember that though our API methods have changed, Python SDK v2 tools follow the general lifecycle of a tool in the workflow. Knowing this allows us to simplify code as we migrate by focusing on the Designer tool lifecycle as our common point of reference. We then describe how you can adapt the Python Engine SDK API patterns and usage to Python SDK v2’s simplified tool lifecycle.
Initialization
Previously when developing for the Python Engine SDK you had 2 points of initialization: the Python object itself in the magic method __init__
and the configuration XML initialization in the method pi_init
.
Now, there is 1 point of initialization that runs for any UpdateMode
, the __init__
magic method. Here you can do the same setup you needed to in either method (pi_init
and __init__
) of the Python Engine SDK (including accessing UI values derived from a tool’s XML config, and other useful engine constants such as the current execution’s update_mode
). Find these values under self.provider.environment
.
Next, during initialization, the Python Engine SDK requires instantiating connections during runtime, and the implementation of related boilerplate to define the anchors and their connections for initialization.
In Python SDK v2, connections do not need to be defined/instantiated at runtime. Further, you can still access underlying connection and anchor data as their configuration is stored under self.outgoing_anchors
and self.incoming_anchors
. Both of which provide dictionary forms of anchor configuration and the respective values. You also have access to metadata—but we cover this in its own section.
Key Takeaways
Python SDK v2 greatly reduces previously required boilerplate by methods like pi_init
+ __init__
implementation requirements, or manually coding connection configuration for usage at runtime. It also provides escape hatches to fine-tune metadata, configuration, UpdateMode logic, and other pre-work tasks.
Data I/O: Record Creation, Reading, and Writing
Once you can initialize and the connections are considered, you likely need to handle some sort of input and/or output data in record format. In the Python Engine SDK, you were required to construct records and call methods like ii_it
and pi_push_all_records
to flush queues filled with Record
created by a RecordCreater
using RecordInfo
, all while managing the queues along the way. You can review the documentation for specific process nuances as well.
While Python SDK v2 enables direct management of metadata (in its own section) and aspects of your records, it provides this functionality in the form of pyarrow
RecordBatch
. That is, you receive and send records in pyarrow
format. This is a robust open-source library and data format that allows for stable serialization, incredibly efficient memory usage, and performant columnar computing. The latest versions of the SDK manage data serialization under the hood (again with escape hatches) using this format to let you focus on handling the records and their data rather than serialization and typing boilerplate.
Important: You can work interchangeably with pandas
if you so prefer. Use pyarrows
built-in method record_batch.to_pandas
. Then you can convert it back to a writable format using record_batch.from_pandas
.
Data types are derived by default, but you can also manually set them using set_metadata
methods. Now, you can declare, create a record set, and write it out to an (automatically defined) output_anchor using a couple lines of code!
data_rb = pa.RecordBatch.from_pydict(
{
"x": [1, 2, 3],
"y": ["hello", "world", "from ayx_python_sdk!"],
"z": [self.config_value, self.config_value, self.config_value],
}
)
self.provider.write_to_anchor("Output", packet
Additionally, any reading or end of stream management is handled by the SDK. Connections close when the plugin ends, or are manually closed via an optional self.provider
method. That is, the SDK cleans up and sets up your record streams, allowing you to focus on handling the data itself for sending.
Key Takeaways
Handling and creating data or records is as simple as working in pyarrow
(or even pandas
) to create RecordBatches
. If you have work to do with metadata or simply wish to preserve it, check the info in the metadata section. Then, write_to_anchor
your batches downstream as you generate them. No need to manage the queue, unless you want to notify Designer that it’s done early. Finally, you read in records via the on_record_batch
method. Refer to the docs for a full description, but you handle anchor
record_batches in pyarrow
format as well!
Designer I/O: DCM and Designer Messages
Using the Python Engine SDK, at some point in design you might have needed the AlteryxEngine
class to use a DCM-related call, or to send an output_message
.
In Python SDK v2, the provider provides access to these calls in 2 ways:
-
First, for DCM it's worth noting that the functions exposed by the Python Engine SDK are now deprecated in favor of a newer version for all use cases. Python SDK v2 provides access to both, with a strong preference for the newer DCM calls. You can access these under
self.provider.dcm
. Please refer to the DCM documentation for assistance regarding DCM usage, as the SDK merely wraps the implementation of DCM to call from the SDK!
You might notice when you use an IDE that bothself.provider.dcm
andself.provider.io
shadowself.provider.__ctrl_io
for readability. We recommend that you use the appropriate reference for the appropriate actions, even though it behaves the same no matter which reference you use. -
Second, for messages sent using
output_message(...)
andAlteryxEngine.xmsg()
you can use the respective provider methods, for example,self.provider.io.translate_msg
, self.provider.io.info(), and others. Where self.provider.io.info() andself.provider.io.warn()
are their output_message counterparts, with the boilerplate done for you under the hood!
Key Takeaways
Don’t use deprecated DCM calls from Python Engine SDK unless you absolutely must! You can access the new DCM API via self.provider.dcm
. For output_message()
, use self.provider.io
and the appropriate messaging/logger level function!
Teardown/Deconstructing
The final part of the tool lifecycle is teardown. In the Python Engine SDK most of your shutdown code likely lives in pi_close()
. The Python SDK v2 uses a similar method called on_complete
. Like the Python Engine SDK, it's called last to handle cleanup. Unlike the Python Engine SDK, however, you have no manual cleanup you must do here related to the connections or records you handled (unless these are non-SDK resources of course). All record queues, connections, etc. flush and close themselves on the back end as part of the final shutdown.
This means you clean up any novel resources (that do not have context managers) you might have created during the tool’s runtime, do any remaining work not done in the other methods, or write any final records downstream (in no particular order). Here, our body is our key takeaway. But even though the lifecycle is done, we still have one more important piece to cover.
Metadata
In the Python Engine SDK, the Field
class was the entry point for manipulating metadata, as in this code from a legacy plugin using the Python Engine SDK:
record_info_out = Sdk.RecordInfo(
self.alteryx_engine
) # A fresh record info object for outgoing records.
try: # Add metadata info that is passed to tools downstream.
for row in cursor.description:
record_info_out.add_field(
row[0], Sdk.FieldType.v_wstring, 1073741823, 0, "", ""
)
The Python SDK v2 still allows programmatic access to metadata. Due to the conversion to arrows
for serialization and transport, it's now managed within the arrows
schema. For any Alteryx Designer-specific values or types, use the metadata functions under core.utils
, for example, set_metadata(...)
, get_metadata
, create_schema
, and more. We strongly recommend that you use these wrappers, as they ensure correctness when transferring complex metadata and maintaining an accurate separation between arrows-only or AYX-only formats.
Packaging
With all code refactored, the only part left is to package your code. In the Python Engine SDK, this often involved a lot of manual processes and bundling to achieve the process zip archive schema to create a YXI.
Now, this is automated—you no longer need a build your tool checklist!
To bundle your plugin, navigate to the root of your workspace and run create-yxi
. Or, if you are testing locally, use designer-install
and the optional --dev
flag to enable hot-loading in your development workspace and current virtual environment to Designer. Either process results in a ready-to-install and use YXI
that contains all your tool dependencies and is run by Designer's own provided embedded Python. Designer-install
merely enables the --dev
flag (No magic here! This is simply a pip
supported function we wrap for ease of use) for convenience and, avoids having to double-click the YXI in the Designer UI to initialize the install.
Should you require any extra packages, add them to the third-party-requirements.txt
in your workspace. The most reliable way to ensure you bundle all current dependencies is to pip freeze
from the virtual environment you set up earlier in the guide. This ensures all currently installed dependencies in your active dev environment are accounted for, and packaged for use when your tool is installed.
Summary
Now, you should have all the resources you need to confidently migrate from Python Engine SDK to Python SDK v2. We strive to continue to support critical functionality, while simultaneously improving the development experience, reducing boilerplate, and improving testing. As development continues, we continue to add more functionality to the latest SDKs. If you feel something is missing, please post an update request to the Alteryx IO Discussion Forum!
Extras
I want to.... in the Python V2 SDK
Create a new Record, or Handle Received Ones
To create a valid Record, create a pyarrow
RecordBatch
, or a pandas
DataFrame
.
You can handle a record in its incoming RecordBatch
format, or use one of pyarrows
many helper functions to convert it to native Python data structures or to pandas.
For more info on RecordBatch usage, go to pyarrow.RecordBatch
Add Metadata
First, set up a pyarrow
schema
that contains the metadata:
self.name = "metadata"
self.provider = provider
self.outputschema = create_schema({
"volts":{
"type":FT.int16,
"size":2,
"source":"Nuclear power",
"description":"Uranium and plutonium"},
"year":{
"type":FT.string,
"size":4,
"source":"Year of work",
"description":"For all people in the world"},
"bool":FT.bool,
"byte":FT.byte,
"int16":FT.int16,
"int32":FT.int32,
"int64":FT.int64,
"fixeddecimal":{"type":FT.fixeddecimal,"size":128,"scale":120},
"float":FT.float,
"double":FT.double,
"string":FT.string,
"wstring":FT.wstring,
"v_string":FT.v_string,
"v_wstring":FT.v_wstring,
"date":FT.date,
"time":FT.time,
"datetime":FT.datetime,
"spatialobj":FT.spatialobj})
provider.push_outgoing_metadata("Output", self.outputschema)
self.provider.io.info(f"{self.name} tool started")
Next, for any records you might wish to apply this metadata to (that match the shape of the metadata), you can use set_metadata
:
df = pd.DataFrame(
{
"volts": [1,32000,32100],
"year":["2022","202235", "testtest"],
"bool":[True,False,True],
# ... snip ...
"spatialobj":["POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))",
"POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))",
"POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"]
}
)
batch = pa.Table.from_pandas(df)
batch = set_metadata(batch, schema=self.outputschema)
To extract metadata for use, we provide get_metadata
and other utilities. You can print/log the results of get_metadata
to get a better feel for the manipulation and serialization of the data.