This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Current State: In current scenario, once we add a new tool to the workflow and want to see its output, we are required to run the complete workflow.
Challenge: This step takes a lot of time if the input data files are huge in size with million of rows and multiple tools are operating on them.
Suggestion/Idea: We could have a "freeze" functionality inbuilt or in tool palette to freeze a portion of workflow or an entire tool container which is not dependent on the output of the newly added tool. This would save time in running workflow.
Example: For example, in attached workflow snapshot, freezing the Tool container 1 while adding tools in Tool container 2 and then running those tools will take less processing time.
I often encounter situations where I need to apply the same formula to several columns. Doing this requires copy/pasting the formula several times and then updating the variable names in the formula for each output column. I wish there was a built in "Current Output Column" variable so that I could build one formula and use that for each column.
I use a lot of the same input files for the processes I do. It would be a huge time-saver if you could include option/area in Input tool where user, after bringing up the Data connections window, could have a "Favorites" section on the Recent page that would show at the top & allow the user to select specific files to show there.
Another maybe easier option could also be that when a file is selected, it gets moved back to the top of the Recent list. Currently, no matter how many times a file is used, it will move down the listing on Recent regardless if you just opened it or not, so even though I'm using the file on a daily basis, it will disappear as soon as I open 8 other files.
It would be helpful to be able to embed a macro within my workflows so in the end I have one single file.
Similar to how Excel becomes a macro enabled file, it would be great if the actual macro could be contained in the workflow. As it stands now, the macro that I insert into a workflow is similar to a linked cell in MS Excel that points to another file. If the macro is moved the workflow becomes broken. I often work on a larger workflow that I save locally while developing. Once it's complete, I then save the workflow to a network drive and have to delete the macros and reinsert these. It also makes it challenging if I were to send a workflow to someone else... I will have to give them instructions on which macros to insert and where. Similar to a container, they could be minimized so to speak to their normal icon, and then expanded/opened if any edits were needed....then collapsed when done.
Currently, if the same Excel file is being updated on the workflow, but different sheets within the file, it will error out if the saving process overlaps one another. And there are some cases that using the tool Block Until Done will not work because there are two data streams (for example if you have a filter and is saving the data from the two outputs on the same file).
It would be great if we could output to the same Excel file more than once on the same workflow.
When the Python Tool operates, it seems to always ingest all the data before processing any of it (i.e. no batch processing). Python can handle this type of functionality with generators, can we update the tool so that it may do some preprocessing (like imports and data prep) and allow a defined generator function to be called repeatedly from a separate input handle and provide batch data frames on output for more parallel-like processing of data?
The Python Tool could be updated as such:
Multi-Input - Same functionality as now, and also allow this data to be used for preprocessing and setting up the Python functions and a single batch function.
Data Input - Ingests data in batches (as most other tools operate) where each batch passes in a dataframe (in this case, a subset of processed entries) into an existing Python function (with a name that is in globals()), and returns another dataframe with that desired output. This can give the option of adding/removing rows as necessary to a subset of the data.
Data Output - Partial set of data after data processing to allow tools further in the chain to process in parallel.
"On Complete" Multi-Outputs - Same functionality as now, to pass process-complete data to the next tool once all data ingested has been processed. Perhaps give the option to pass the complete set from Data Output.
A simple use-case, if a user wanted to use only the Python Tool:
Let's say a user wants to get all URLs from every post in a thread (containing millions of posts) that are in blacklisted domains.
Data prep that sends the list of blacklisted domains into the Python Tool's Multi-Input handle, and that data is transformed and stored in a set within the Python tool once.
A series of posts (strings) are sent in batches (let's say ~10000) to the Data Input of the Python Tool. The tool calls a defined Python function that extracts all the URLs, and filters those in the blacklist.
That data is then transformed into a DataFrame which is then sent to the Data Output of the Python Tool, and only contains results corresponding to the small batch of posts that were ingested. Alteryx can also use this to track progress during execution.
Once all posts have been processed, one of the Python Tool's Multi-Outputs can return a total count of URLs found that were NOT in the blacklist (sure this can be a part of the Data Output, but just for the sake of this example). Could also be used to trigger "on-complete events."
I know I used the term "generators" above, and the design could probably be simplified to instead call an Alteryx Python function that yields from a function to await input from the next batch to use actual Python generators. However, I feel my initial approach could be thought of as a simpler process since generators are more of an intermediate functionality.
I hope this makes sense and is elaborate enough to pursue. Thanks for the consideration!
Even though we have many form of inputs like alteryx database (.yxdb), calgary database (.cydb) etc, to select a .tde file as an input and analyse that data is not available in alteryx designer. That would be great if the feature is enabled, because if something goes wrong in tableau reporting, it will be easy to take that .tde file and analyse in alteryx rather than checking in tableau.
It would be useful to be able to select a single container (containing a data input) or multiple containers using Shift, and run those and only those.
When building a new element to a larger workflow, I often enter a new Input in a new container, the ability to run just that container without having to turn off all my other containers would be really useful in speeding up the start of joining things together.
R has a very large number of useful packages and examples. Often, we only need a few lines of R code. However, integrating that with the data flow in Alteryx can be complex. It would be ideal if there was a tool where you could drop in R code, and have the tool create named inputs and outputs for each variable in the R code, and create blank text documents or YXDBs with the correct column names and variable types. This seems like it could be automated, and would eliminate a lot of trial and error in using small pieces of R code for specialty tasks.
It would be very helpful if there was a master variable list for the entire workflow, with one column of that list being the first tool number where the variable appeared. For example, when using JOIN with many fields, it is pretty easy to get duplicate fields. It is also common to have fields that are only slightly different, for example, "Variable_1" and "Variable 1." If there was a master list of variables with hotlinks to the first tool where a variable appeared, it would be easy to fix duplicate or near-duplicate variables. Other useful fields for a variable summary would include the variable type (integer, double, etc.), whether the field has any nulls, and whether the field has any text.
The #bandofsolvers community has come up with many creative ways to determine if an 'output' action is complete before proceeding with next steps. However, what we really need is an optional output anchor added to (all?) tools in this class.
For example, currently if we need to Output Data to the same file 3 times, we have to put logic in place to make sure that the 3 updates happen in the correct sequence and do not interfere with each other. Or if we need to Render a file and perform additional modifications or file actions on that new file (e.g. ACL using icacls), we have to put checks in place to wait for the render to complete and make sure the file is freed by the write step.
However, if we could have at minimum an optional output anchor pass a Boolean flag indicating the 'output' class tool is complete, that would help tremendously! Even more helpful would be a xml/json object containing the tool configuration. Additionally, data/metadata 'pass-through' could be helpful in some situations as well.
I understand that this simple request could be significant change to the structure of the program, but throwing it out there for the 'Idea' space! 🙂
As well as using keyboard shortcuts, many of us are using a mouse / keyboard with program specific assignable shortcut buttons. It is a serious boost to productivity. The ability to instantly enable / disable would be a great tool large complex workflows. In general, it would be great to expand the keyboard shortcuts to offer more Alteryx specific advanced functions.
Add a new feature to develop your own customized decision tree with Insight. So instead of using a tree generated with the Decision Tree tool a user can generate a tree with custom splits and save the splitting rules as a model to score later a new dataset. This will provide user the ability to enhace a tree with business knowledge.
The add to / remove from container behavior needs to be modified. I have frequently had the application completely rearrange my workflow because of it. I was just deleting a handful of closed containers when the application removed all my tools from their individual containers and wrapped everything in one big container completely screwing up my entire workflow. This happens a lot. Now I have to reorganize the workflow. This is one of my biggest frustrations with the application.
It would be very useful for me If I could consolidate in the same output two different inputs: 1- the whole output flow; 2- The summarize from the output. That would save some time from doing pivot table analysis for instance.
It will be great to make visibility of workflow execution results to other users in same subscription.
As of now, only schedules are visible to all users in a subscription, but not the workflow execution results executed by a user to other users in same subscription.
This will avoid duplicate execution of same workflow by multiple user in a team as it will provide option to cross check the execution results by other users, if executed already, before execution of same workflow.