Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Data Science

Machine learning & data science for beginners and experts alike.
EmilyVA
Alteryx
Alteryx

We are excited to announce the new Workflow Summary Tool, leveraging the power of OpenAI’s ChatGPT to automatically provide concise summaries of a workflow’s purpose, inputs, outputs, and key logic steps. Just pull the Workflow Summary tool into a single workflow or point it at a whole directory of them, add a DCM connection with your OpenAI API key, and presto! The tool outputs a few-word topic, a headline, and a paragraph-length summary for each workflow! It gives you the choice of storing this summary in the Workflow Info Description field (accessible by Alteryx Server when a workflow is deployed there) and/or sending the summary downstream from the tool for export to a file or further analysis.

 

Installation

 

Download the WorkflowSummary.yxi installer, double click the yxi file, and then click the “Install” button.

 

Tool Installer screenshot.png

 

Once you have installed the tool, please restart Designer Desktop. (This is required for the new DCM Connection schema to be recognized.) You should then see the Workflow Summary tool in your Laboratory toolbar! (If you do not see the Laboratory toolbar in your Designer Desktop, click the settings icon on the right side of the Tool Palette and check the box next to Laboratory.)

 

Setting up the DCM Connection for your OpenAI API key

 

1. Using the Workflow Summary tool requires an OpenAI API key. When you first create an OpenAI account you usually receive starter credits that will let you test using the Workflow Summary tool for free. OpenAI API keys can be created through OpenAI Account management.

 

2. Tell Designer to allow SDK tools (such as Workflow Summary) to use the Data Connection Manager (you should only have to do this once):

a. After restarting Designer Desktop, first go into Options → User Settings → Edit User Settings:

 

image-20230419-154700 (3).png

 

b. Select the DCM tab along the top of the User Settings window, and Enable DCM with SDK Access Mode set to AllowAll: 

 

image-20230419-154602 (2).png

 

3. Set up a DCM Connection using your OpenAI API key (you should also only have to do this once, unless your key changes):

a. or choose the Save/Update API Key in DCM button in the Workflow Summary configuration panel:

 

image-20230519-002633 (1).png

 

b. Click the “New” button (in versions of Designer before 23.1, this was called “Add Data Source”):

 image-20230725-003514.png

 

For Designer before 23.1:

 

image-20230419-155210 (2).png

 

c. Give your Data Source a name you’ll find helpful in identifying it, and then click Save:

 
image-20230419-155705 (2).png

 

d. Click the “+ Connect Credential” link:

 

image-20230419-155740 (3).png

 

e. Either choose a pre-existing API token or choose “Create New Credential” and enter your OpenAI API key in the API TOKEN drop down. Then click the Link button.
 
image-20230419-163305 (2).png

 

f. Return to the Data Sources page (link on the left of the window above) to confirm that your Data Source has been created:
 image-20230419-173105 (2).png
 
4. Now any time you pull a Workflow Summary tool onto the Designer Desktop canvas, when you click the “Save / Update API Key in DCM” button, you’ll be able to choose the connection you made:
 

image-20230420-222150 (2).png

 

and then click “Connect” to tell the tool to use that OpenAI API key for your Workflow Summary queries to OpenAI:

 

image-20230420-222240 (2).png

 

Your tool should remember this connection unless you re-open the “Save/Update API Key in DCM” button and change it, or unless you delete the tool.

 

How does it work?

 

The files that define many workflows are too long for direct input to the ChatGPT (gpt-3.5-turbo) model in the OpenAI APIs. So we developed a set of strategies to convert each workflow file into text of a length that the ChatGPT model can accept. These include:

  1. Extracting key configuration options for each tool and connections between each tool, while leaving out configuration options that don’t really help ChatGPT understand the workflow. For example, we keep the text in a Comment tool, but not the position of the box on the canvas, the font, or the background color.

  2. If there are any tools that are especially long by themselves, for example a Formula tool with a very long formula or an R or Python tool with a long set of code in it, these may be individually summarized by ChatGPT first before being combined back in with the rest of the tools.

  3. If the workflow text is still too long for ChatGPT, we look for containers of tools that can be summarized first before combining that summary with the rest of the workflow.

  4. Finally, if the workflow is still too long, we work our way through the flow of the tools and summarize chunks of tools; then we “summarize the summaries” to get an overview of the whole workflow.

 

We also spent some time determining effective prompts to get the ChatGPT models to usefully summarize our workflow text. (It sometimes wanted to default to “this is an Alteryx workflow that analyzes data” - which is almost always true and almost never helpful!) We also worked with the prompts to reduce the model’s likelihood of “hallucinating” - making up plausible sounding summaries because there isn’t enough information in the distilled workflow text to constrain the model’s output to an accurate summary.

 

When we combined all these techniques, we found that we were able to come up with amazing summaries for even the longest and most complex workflows!

 

Considerations

 

Alteryx Requirement

 

The Workflow Summary tool was created using the Alteryx Python SDK v2 which is compatible with all Alteryx Designer Desktop releases starting with 2021.4.

 

OpenAI Access and Costs

 

  • Using the Workflow Summary tool requires you to provide your own OpenAI API key. When you first create an OpenAI account you usually receive starter credits that will let you test using the Workflow Summary tool for free. After that, your OpenAI usage will be billed through your OpenAI account. In general we have found that (once your free credits are exhausted) it costs on the order of <$0.01 to $0.05 to summarize a workflow, depending on its length and complexity. (This could change if the OpenAI pricing structure changes.) Most of the workflows we’ve tested cost fractions of a penny, but a few of the longer and more complex ones have cost a bit more. The workflow results log and the “Tokens Used” results column shows you how many tokens were used in your OpenAI account for each workflow summary. To estimate the cost for each workflow summary, look up the current OpenAI pricing per thousand tokens used for the Chat gpt-3.5-turbo (4k context) model. Then use the formula:

    [Tokens Used] / 1000 * [Price per thousand tokens]

  • The OpenAI API “chat” rate limits apply to your calls through the Workflow Summary tool. If the Workflow Summary tool starts hitting the rate limits, it will increase the time between its call to OpenAI until they are no longer triggering the OpenAI rate limit error. The rate limits OpenAI applies to your account vary based on the type and age of the account (currently only 3 requests per minute for free users), which means that upgrading your OpenAI account from free to paid may substantially speed up the Workflow Summary tool.

  • In order to run, the Workflow Summary tool will need to be able to use your OpenAI API key to call the OpenAI API chat completion endpoint through your organization’s firewall.

 

ChatGPT Hallucinations

 

We have tested the Workflow Summary tool extensively on internal workflows within Alteryx. We have found that in general, the summaries can be very accurate and helpful. We have also put in guard rails against the phenomenon of hallucinations (when Large Language Models such as ChatGPT create text that seems plausible but is incorrect or inaccurate). Sometimes the Workflow Summary tool will produce output such as the following example, which indicates that the ChatGPT model did not feel confident that it could provide a good summary of your workflow, often because the workflow was too short to give it sufficient context:

 

image-20230425-170820.png

 

While these guard rails are helpful, they may not be infallible. Therefore, the Workflow Summary tool can be a powerful aid in documenting your workflows, but its results should undergo human verification before being used to make any critical decisions.

 

Data Privacy

 

At Alteryx, data privacy is of paramount importance to us. We believe in maintaining transparency and clarity when it comes to the information that is shared between Alteryx and OpenAI in order to generate workflow summaries.  

 

As a third-party provider, OpenAI has its own policies governing the handling of data received from Alteryx. We encourage you to review OpenAI’s policies for a more comprehensive understanding of their practices.

 

In order to create more meaningful and detailed workflow summaries using ChatGPT, Alteryx provides instructions to OpenAI on how the workflow processes the data. This allows ChatGPT to generate insightful summaries that go beyond generic descriptions. It’s important to emphasize that while the workflow instructions and metadata are shared with OpenAI, no raw data is shared with OpenAI. The primary objective is to provide substantial and meaningful input to generate insightful workflow summaries. For your reference, the table below presents an overview of the data types that are sent and those that are not sent to OpenAI.  

 

What the Workflow Summary tool sends to OpenAI

What the Workflow Summary tool does NOT send to OpenAI

  • Workflow File name (e.g. Myworkflow.yxdb)

  • Tool ID numbers and names (e.g. Tool 3: Formula)

  • Key configuration options for each tool (e.g. Filter the data on Field1 for values XYZ, Summarize Field2 by taking the average, etc.)

  • Connections between tools (e.g. the R output anchor from Tool 19 Join connects to the I input anchor of Tool 20 Union)

  • Data metadata: Field names and data types

  • Workflow File path (e.g. C:\myname\mycoolproject\)

  • Data embedded in Text Input tools

  • Any data that would flow through the workflow

  • Workflow file metadata (owner, creation date, etc.)

 

Conclusion

 

We hope the Workflow Summary Tool supports your efficiency, governance, and team collaboration! Please download it here, give it a try, and let us know what you think in the comments! Go forth and Summarize!

 

Source: GIPHY

Comments
JohnOrton1990
6 - Meteoroid

This is awesome.  I've downloaded and installed.  The OpenAI data source is not selectable.  Any help would be appreciated.

 

Good news.  I solved it.

Hayden_McHan
7 - Meteor

Anyone getting these errors? 

Workflow Summary (14) Internal error: Failed to read port assignment
Workflow Summary (14) Internal Error - Deadlock detected (@1)

 

EmilyVA
Alteryx
Alteryx

@Hayden_McHan I've on occasion gotten port errors like that from the Python SDK... usually retrying or restarting Designer has fixed it.  What version of Designer Desktop are you running?

Hayden_McHan
7 - Meteor

@EmilyVA I've restarted a few times and same thing...Just updated 23.1.1.123. I haven't used the Python SDK before or seen this error before

EmilyVA
Alteryx
Alteryx

@Hayden_McHan  thanks so much for trying to make this work!  Could I ask you to try deleting the tool (either from C:\Users\YOURUSERNAME\AppData\Roaming\Alteryx\Tools or from C:\ProgramData\Alteryx\Tools , depending on whether you installed for yourself or for everyone) and then re-running the yxi installer?  

Hayden_McHan
7 - Meteor

@EmilyVA I had it installed for just me but deleted it and then reinstalled as an admin for everyone and now it works! Just have to buy some API credits now. Thank you!!!

EmilyVA
Alteryx
Alteryx

@Hayden_McHan Excellent!  I'm so happy to hear it's working!  Would love to hear more about your experience with the results, once you get them!

cjaneczko
13 - Pulsar

What are you supposed to select as the Technology in the Connection Manager? It wont let me create a new connection without selecting a Technology and I dont see anything related to OpenAI or ChatGPT as an option.

 

 

image.png

EmilyVA
Alteryx
Alteryx

Hi @cjaneczko - the way the Data Connection Manager handles technologies for Python SDK tools (which this is) changed a bit in the most recent versions of Designer Desktop.  At this point, if you go in through the File menu --> Manage Connections --> +New then if you scroll down in the technologies options, you'll find an "SDK Tool" box.  If you select that, then in the Technology drop down on the next screen, "OpenAI" is one of the options.

 

A simpler approach is to put a Workflow Summary tool on canvas and click the "Save/Update API key in DCM" button at the top of the configuration window for that tool.  This will automatically filter to the OpenAI Technology in both your current connections and the creation of a new one.

MvdB_F
5 - Atom

This will make documenting our automations much simpler!  Is it possible to connect this tool to an internal openAI source using Windows Login Credentials as the authentication?

EmilyVA
Alteryx
Alteryx

Hi @MvdB_F - glad to hear it might be useful!  Connecting to an internal Azure OpenAI instance is on our list of potential future improvements, but is not currently supported.

dgarmor
7 - Meteor

Can you set up this tool to use a local AI framework such as a server on LM Studio?  LM Studio says " server can be used as a drop-in replacement to OpenAI API.".  

EmilyVA
Alteryx
Alteryx

That's an interesting suggestion, @dgarmor!  The tool doesn't do this currently, but we'll add it to our list of ideas for future features!

MvdB_F
5 - Atom

Hi @EmilyVA has there been any updates on these features?

networkmike42
8 - Asteroid

Hello @EmilyVA ,

 

I an on designer 2023.2.1.89, Patch 2

 

I configured the setup as in :

https://community.alteryx.com/t5/Data-Science/How-To-Use-the-Workflow-Summary-Tool/ba-p/1122280

(most importantly the DCM configuration in user settings)

 

When I follow these instructions:

 

"the way the Data Connection Manager handles technologies for Python SDK tools (which this is) changed a bit in the most recent versions of Designer Desktop. At this point, if you go in through the File menu --> Manage Connections --> +New then if you scroll down in the technologies options, you'll find an "SDK Tool" box. If you select that, then in the Technology drop down on the next screen, "OpenAI" is one of the options.

 

A simpler approach is to put a Workflow Summary tool on canvas and click the "Save/Update API key in DCM" button at the top of the configuration window for that tool. This will automatically filter to the OpenAI Technology in both your current connections and the creation of a new one."

 

There is no data source for openAI in either location.

 

Thanks,

 

MIke

 

 

gwlea
5 - Atom

Hello.  I recently upgraded to a new computer and re-installed Alteryx.  I downloaded the Workflow Summary tool and when I clicked to install it, the InstallAward Wizard opened.  I did not get the option to install the Workflow Summary Tool.  The wizard tried to install Designer 2022.1.  I am currently on 2023.1.  I also tried a previous download of the Workflow Summary Tool and received the same result when trying to install it.

 

UPDATE:  after several re-boots and re-installation attempts, I was able to install the tool.

EmilyVA
Alteryx
Alteryx

@gwlea Glad it worked for you eventually!  Sorry it was a bit painful!

kc20
5 - Atom

Hey All,

 

I'm facing same issue as @networkmike42

 

I'm running Version: 2023.1.1.361 Patch: 6 and when try to add API key by clicking on "save/update API key in DCM" on the Workflow Summary tool, on the next screen i click on "+New" but after that it does not give any option save that information. how do i fix this?