Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Ideas

Share your Designer product ideas - we're listening!

1 Review

Our submission guidelines & status definitions before getting started

2 Search

The community for a solution or existing idea before posting

3 Vote

By clicking the like in the top left corner of an idea you support

4 Submit

A new idea to suggest a product enhancement or new feature

Suggest an idea
It would be great to see the R tool updated with the same interface as the Python tool to reduce the need to rerun the workflow when testing.

Unsupervised learning method to detect topics in a text document.


Helpful for users interested in text mining.

When working with R code and errors occur, the application needs to show which line the error happened on.

  • Predictive

I think the Nearest Neighbor Algorithm is one of the least used, and most powerful algorithms I know of.  It allows me to connect data points with other data points that are similar.  When something is unpredictable, or I simply don't have enough data, this allows me to compare one data point with its nearest neighbors.


So, last night I was at school, taking a graduate level Econ course.  We were discussing various distance algorithms for a nearest neighbor algorithm.  Our prof discussed one called the Mahalanobis distance.  It uses some fancy matrix algebra.  Essentially it allows it it to filter out the noise, and only match on distance algorithms that are truly significant.  It takes into account the correlation that may exists within variables, and reduces those variables down to only one.  


I use Nearest Neighbor when other things aren't working for me.  When my data sets are weak, sparse, or otherwise not predictable.  Sometimes I don't know that particular variables are correlated.  This is a powerful algorithm that could be added into the Nearest Neighbor, to allow for matches that might not otherwise be found.  And allow matches on only the variables that really matter.  

A lot of popular machine learning systems use a computer's GPU to speed up some of the math to a huge degree. The header on this article on Medium shows a 15x difference from a high-end CPU vs a high-end GPU. It could also create an improvement in the spatial tools. Perhaps Alteryx should add this functionality in order to speed up these tools, which I can imagine are currently some of the slowest.

XGboost regression is now the benchmark for every Kaggle competition and seems to consistently outperform random forest, spline regression, and all of the more basic models. For those of us using predictive modeling on a regular basis in our actual work, this tool would allow for a quick improvement in our model accuracy. And I think, from a marketing standpoint, having a core group of users competing in Kaggle using Alteryx would be a great way to show off Alteryx's power.


It is readily available as an R package:

I checked out the "Boosted" model and see that it basically wraps the "gbm" model in R.  I would like to request a similar wrapping for the newer xgb (or xgboost) -- eXtreme Gradient Boosting, which is very fast and accurate, and is winning Kaggle competitions left and right.  It would be a great addition and is something SAS probably won't have it for another 10 years, if ever.


  • Predictive

Designer should support statistical testing tools that ignore data distribution and support Statistical Learning methods.


Alteryx already supports resampling for predictive modeling with Cross-Validation.


Resampling tools for bootstrap and permutation tests (supporting with or without replacement) should be tools for analysts and data scientists alike that assess random variability in a statistic without needing to worry about the restrictions of the data's distribution, as is the case with many parametric tests, most commonly supported by the t-test Tool in Alteryx. With modern computing power the need for hundred-year-old statistical sampling testing is fading: the power to sample a data set thousands of times to compare results to random chance is much easier today.


The tool's results could include, like R, outputs of not only the results histogram but the associated Q-Q plot that visualizes the distribution of the data for the analyst. This would duplicate the Distribution Analysis tool somewhat, but the Q-Q plot is, to me, a major missing element in the simplest visualization of data. This tool could be very valuable in terms of feeding the A/B Test tools.

I would like to suggest to add a widget which encapsulate an R script able to perform outlier detection, something similar like netflix did:


Thank you.





The capability to input/output R Datasets via the input/output tools, together with all the other data formats as well (like csv, Excel, SAS, SPSS, etc).

This request is largely based on the implementation found on AzureML; (take their free trial and check out the Deep Convolutional and Pooling NN example from their gallery).  This allows you to specify custom convolutional and pooling layers in a deep neural network. This is an extremely powerful machine learning technique that could be tricky to implement, but could perhaps be (for example) a great initial macro wrapped around something in Python, where currently these are more easily implemented than in R.

  • Predictive
It would be great if we could output the coefficients of regression equation to a table so that one can use them in rest of the module. Currently, Alteryx can output the table/coefficients in charts/reports form which is not re-usable as such in the module. 
The values of coefficients/Residuals/Errors would be very useful in building macros for techniques like Missing Value Analysis which can't be done in Alteryx as of now.
  • Predictive


A funcionality added to the Impute values tool for multiple imputation and maximum likelihood imputation of fields with missing at random will be very useful.



Missing data form a problem and advanced techniques are complicated. One great idea in statistics is multiple imputation,

filling the gaps in the data not with average, median, mode or user defined static values but instead with plausible values considering other fields.


SAS has PROC MI tool, here is a page detailing the usage with examples:

Also there is PROC CALIS for maximum likelihood here...


Same useful tool exists in spss as well




I am not sure if this capability exists but I assume it does not.

We have a need to optimize a Linear Program (LP) model that consists of a system of equations and has both:
An objective function and a series of constraints. One of the software capabilities that SAS offers that currently
Alteryx does not have is this optimization capability. 

I am wondering if the capability is currently not available, is this capability in the Product Roadmap?


  • Predictive

Hello! Almost all statistical softwares allow for the analyst to use either a pairwise or a listwise option when applying clustering techinques. This option affects only how the inner distance matrix is built, and after that whichever algorithm you choose is peformed. However in Alteryx [K-Centroids] by default does listwise, classifying only those records where the selected variables have no nulls.


Please consider adding this option!


PS: the difference is pairwise will build the distance between 2 variables depending on those records that have no nulls on both variables, while listwise will run the distance matrix after it has checked for complete non null records in all variables of interest (not one at a time distance calculation).

I am trying to run batch regressions on a pretty sizable set of data.  About ~1M distinct groups of data, each wtih 30-500 x,y pairs.


A batch macro with a linear regression works ok - but it is really slow.  Started at about 2-3s per regression.  After stripping out bunch or reporting from the macro, I am down to ~2s.  This is still feels quite slow compared to something purpose built.


Has anyone experimented with higher speed versions that just dump out m,b, & r2?

There is a web hosted trial that anyone can have a hands on experiance with alteryx tutorials  without even downoading the tool.

That's awesome...


It may be a nice idea to;


1) either start seperate "Alteryx-kaggle" instances with data sets specific to each kaggle competition so that anyone want to try out may have a go with those well known examples thru the Alteryx site, 

2) Or even better have a partnership with kaggle so that anyone can just have it's own Alteryx trial per specific competition on the kaggle website...


I'm sure this will draw a lot of attention...



You'll immediately have a greater reach in Kaggle community, some data hobbiyists and cs, ie students and acedemics (which will eventually end up doing lot's of data blending when ther are going to be hired by top notch firms...


In forecasting and in commercial/sme risk scoring there is a need for trying vast number of algebraic equations which is a very cumbersome prosess. Let's add symbolic regression as a new competitive capability.



Summations, ratios, power transforms and all combinations of a like are needed to be tested as new variables for a forecasting or prediction model. Doing this by hand manually is a though and long business... And there is always a possibility for one to skip a valuable combination.



Symbolic regression is a novel techinique for automatically generating algebraic equations with use of genetic programming
In every evolution a variable is selected checked if the equation is discriminatitive of the target variable at hand. In every next step frequently observed variables will be selected more likely.


SR comparison.jpg


Benefit for clients:

This method produces variables mainly with nonlinear relationships. It is a technique that will help in corporate/commercial/sme risk modelling, such that powerful risk models are generated from a hort list of B/S and P/L based algebraic equations.
There is potential use cases in algorithmic trading as well...


There are 3 very interesting world problems solved with symbolic regression here.

A very relevant thesis by sean Wouter is attached as a pdf document for your reading pleasure...


R side of things:

I've found Rgp package for genetic programming, here is a link.



I haven't seen something similar in SAS, SPSS but there is this;

Also there is Bruce Ratner's page



Some well known scoring methods use optimal binned variables for added robustness. Let's add this capability to Alteryx.



Here's a basic link on why to do that;


Current status in Alterys as I'm aware of:

Tile tool or Multi-field Binning tool for completing same task as Tile tool on multiple fields, splits the variables by 5 methods;

  • Equal Records or Intervals or Sums

  • Smart Tile

  • Unique Value 

  • Manual

Unfortunately "equal something" binnings are bad idea, as the values are categorized "blindly" irrespective of the effects on the predictive power of the models. 


What to do:

What's needed is to bin both numerical and categorical variables optimally such that the Weights of Evidences (WoE) should present a monotone increasing or decreasing pattern. Maybe at most a V or U shaped "convex" structure.


Quick win:

Without constraining ourselves with monotonicity or convex cases, the easiest practice would be running a C4.5 or CHAID tree algorithm (produces multiple splits rather than binary splits in CART) for a single variable and select the target as the dependent variable and all the resulting nodes will be the bins we are looking for. Doing this for multiple variables at once is the key to the tool to be generated.



This capability is sought by risk management departments building robust, stable Basel compliant models in financial industry, especially by banks.

When scoring data if you have values in predictor fields not seen in the data that was used to build the model the score tool will not score the record.  Makes sense but it would be nice to know how impactful the issue is.  Please provide a count of records not scored for these reasons as well as a count of records not scored because of exceeding the limit in the configuration tab of the score tool. and a count for any other reason a record is not scored so we have a clear understanding of how many were scored and how many were not and why.

  • Predictive
Top Liked Authors