Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Data Science

Machine learning & data science for beginners and experts alike.
EmilyVA
Alteryx
Alteryx

We are excited to announce the new Workflow Summary Tool, leveraging the power of OpenAI’s ChatGPT to automatically provide concise summaries of a workflow’s purpose, inputs, outputs, and key logic steps. Just pull the Workflow Summary tool into a single workflow or point it at a whole directory of them, add a DCM connection with your OpenAI API key, and presto! The tool outputs a few-word topic, a headline, and a paragraph-length summary for each workflow! It gives you the choice of storing this summary in the Workflow Info Description field (accessible by Alteryx Server when a workflow is deployed there) and/or sending the summary downstream from the tool for export to a file or further analysis.

 

Installation

 

Download the WorkflowSummary.yxi installer, double click the yxi file, and then click the “Install” button.

 

Tool Installer screenshot.png

 

Once you have installed the tool, please restart Designer Desktop. (This is required for the new DCM Connection schema to be recognized.) You should then see the Workflow Summary tool in your Laboratory toolbar! (If you do not see the Laboratory toolbar in your Designer Desktop, click the settings icon on the right side of the Tool Palette and check the box next to Laboratory.)

 

Setting up the DCM Connection for your OpenAI API key

 

1. Using the Workflow Summary tool requires an OpenAI API key. When you first create an OpenAI account you usually receive starter credits that will let you test using the Workflow Summary tool for free. OpenAI API keys can be created through OpenAI Account management.

 

2. Tell Designer to allow SDK tools (such as Workflow Summary) to use the Data Connection Manager (you should only have to do this once):

a. After restarting Designer Desktop, first go into Options → User Settings → Edit User Settings:

 

image-20230419-154700 (3).png

 

b. Select the DCM tab along the top of the User Settings window, and Enable DCM with SDK Access Mode set to AllowAll: 

 

image-20230419-154602 (2).png

 

3. Set up a DCM Connection using your OpenAI API key (you should also only have to do this once, unless your key changes):

a. or choose the Save/Update API Key in DCM button in the Workflow Summary configuration panel:

 

image-20230519-002633 (1).png

 

b. Click the “New” button (in versions of Designer before 23.1, this was called “Add Data Source”):

 image-20230725-003514.png

 

For Designer before 23.1:

 

image-20230419-155210 (2).png

 

c. Give your Data Source a name you’ll find helpful in identifying it, and then click Save:

 
image-20230419-155705 (2).png

 

d. Click the “+ Connect Credential” link:

 

image-20230419-155740 (3).png

 

e. Either choose a pre-existing API token or choose “Create New Credential” and enter your OpenAI API key in the API TOKEN drop down. Then click the Link button.
 
image-20230419-163305 (2).png

 

f. Return to the Data Sources page (link on the left of the window above) to confirm that your Data Source has been created:
 image-20230419-173105 (2).png
 
4. Now any time you pull a Workflow Summary tool onto the Designer Desktop canvas, when you click the “Save / Update API Key in DCM” button, you’ll be able to choose the connection you made:
 

image-20230420-222150 (2).png

 

and then click “Connect” to tell the tool to use that OpenAI API key for your Workflow Summary queries to OpenAI:

 

image-20230420-222240 (2).png

 

Your tool should remember this connection unless you re-open the “Save/Update API Key in DCM” button and change it, or unless you delete the tool.

 

How does it work?

 

The files that define many workflows are too long for direct input to the ChatGPT (gpt-3.5-turbo) model in the OpenAI APIs. So we developed a set of strategies to convert each workflow file into text of a length that the ChatGPT model can accept. These include:

  1. Extracting key configuration options for each tool and connections between each tool, while leaving out configuration options that don’t really help ChatGPT understand the workflow. For example, we keep the text in a Comment tool, but not the position of the box on the canvas, the font, or the background color.

  2. If there are any tools that are especially long by themselves, for example a Formula tool with a very long formula or an R or Python tool with a long set of code in it, these may be individually summarized by ChatGPT first before being combined back in with the rest of the tools.

  3. If the workflow text is still too long for ChatGPT, we look for containers of tools that can be summarized first before combining that summary with the rest of the workflow.

  4. Finally, if the workflow is still too long, we work our way through the flow of the tools and summarize chunks of tools; then we “summarize the summaries” to get an overview of the whole workflow.

 

We also spent some time determining effective prompts to get the ChatGPT models to usefully summarize our workflow text. (It sometimes wanted to default to “this is an Alteryx workflow that analyzes data” - which is almost always true and almost never helpful!) We also worked with the prompts to reduce the model’s likelihood of “hallucinating” - making up plausible sounding summaries because there isn’t enough information in the distilled workflow text to constrain the model’s output to an accurate summary.

 

When we combined all these techniques, we found that we were able to come up with amazing summaries for even the longest and most complex workflows!

 

Considerations

 

Alteryx Requirement

 

The Workflow Summary tool was created using the Alteryx Python SDK v2 which is compatible with all Alteryx Designer Desktop releases starting with 2021.4.

 

OpenAI Access and Costs

 

  • Using the Workflow Summary tool requires you to provide your own OpenAI API key. When you first create an OpenAI account you usually receive starter credits that will let you test using the Workflow Summary tool for free. After that, your OpenAI usage will be billed through your OpenAI account. In general we have found that (once your free credits are exhausted) it costs on the order of <$0.01 to $0.05 to summarize a workflow, depending on its length and complexity. (This could change if the OpenAI pricing structure changes.) Most of the workflows we’ve tested cost fractions of a penny, but a few of the longer and more complex ones have cost a bit more. The workflow results log and the “Tokens Used” results column shows you how many tokens were used in your OpenAI account for each workflow summary. To estimate the cost for each workflow summary, look up the current OpenAI pricing per thousand tokens used for the Chat gpt-3.5-turbo (4k context) model. Then use the formula:

    [Tokens Used] / 1000 * [Price per thousand tokens]

  • The OpenAI API “chat” rate limits apply to your calls through the Workflow Summary tool. If the Workflow Summary tool starts hitting the rate limits, it will increase the time between its call to OpenAI until they are no longer triggering the OpenAI rate limit error. The rate limits OpenAI applies to your account vary based on the type and age of the account (currently only 3 requests per minute for free users), which means that upgrading your OpenAI account from free to paid may substantially speed up the Workflow Summary tool.

  • In order to run, the Workflow Summary tool will need to be able to use your OpenAI API key to call the OpenAI API chat completion endpoint through your organization’s firewall.

 

ChatGPT Hallucinations

 

We have tested the Workflow Summary tool extensively on internal workflows within Alteryx. We have found that in general, the summaries can be very accurate and helpful. We have also put in guard rails against the phenomenon of hallucinations (when Large Language Models such as ChatGPT create text that seems plausible but is incorrect or inaccurate). Sometimes the Workflow Summary tool will produce output such as the following example, which indicates that the ChatGPT model did not feel confident that it could provide a good summary of your workflow, often because the workflow was too short to give it sufficient context:

 

image-20230425-170820.png

 

While these guard rails are helpful, they may not be infallible. Therefore, the Workflow Summary tool can be a powerful aid in documenting your workflows, but its results should undergo human verification before being used to make any critical decisions.

 

Data Privacy

 

At Alteryx, data privacy is of paramount importance to us. We believe in maintaining transparency and clarity when it comes to the information that is shared between Alteryx and OpenAI in order to generate workflow summaries.  

 

As a third-party provider, OpenAI has its own policies governing the handling of data received from Alteryx. We encourage you to review OpenAI’s policies for a more comprehensive understanding of their practices.

 

In order to create more meaningful and detailed workflow summaries using ChatGPT, Alteryx provides instructions to OpenAI on how the workflow processes the data. This allows ChatGPT to generate insightful summaries that go beyond generic descriptions. It’s important to emphasize that while the workflow instructions and metadata are shared with OpenAI, no raw data is shared with OpenAI. The primary objective is to provide substantial and meaningful input to generate insightful workflow summaries. For your reference, the table below presents an overview of the data types that are sent and those that are not sent to OpenAI.  

 

What the Workflow Summary tool sends to OpenAI

What the Workflow Summary tool does NOT send to OpenAI

  • Workflow File name (e.g. Myworkflow.yxdb)

  • Tool ID numbers and names (e.g. Tool 3: Formula)

  • Key configuration options for each tool (e.g. Filter the data on Field1 for values XYZ, Summarize Field2 by taking the average, etc.)

  • Connections between tools (e.g. the R output anchor from Tool 19 Join connects to the I input anchor of Tool 20 Union)

  • Data metadata: Field names and data types

  • Workflow File path (e.g. C:\myname\mycoolproject\)

  • Data embedded in Text Input tools

  • Any data that would flow through the workflow

  • Workflow file metadata (owner, creation date, etc.)

 

Conclusion

 

We hope the Workflow Summary Tool supports your efficiency, governance, and team collaboration! Please download it here, give it a try, and let us know what you think in the comments! Go forth and Summarize!

 

Source: GIPHY

Comments