Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!
The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Featured Ideas

A lot of popular machine learning systems use a computer's GPU to speed up some of the math to a huge degree. The header on this article on Medium shows a 15x difference from a high-end CPU vs a high-end GPU. It could also create an improvement in the spatial tools. Perhaps Alteryx should add this functionality in order to speed up these tools, which I can imagine are currently some of the slowest.

I would like to suggest to add a widget which encapsulate an R script able to perform outlier detection, something similar like netflix did:

 

Thank you.

 

Regards,

Cristian

  

I would like to share some feedback regarding the Principal Component tool.

I've selected the option "Scale each field to have unit variance" and 1 of the 4 PCA tools was displaying errors. However, the error message is not very intuitive and I couldn't use it to debug my workflow. The problem was that for my type of data, scaling could not be applied since it had a lot of 0 values.

Couldn't find anything related to this, so hope my feedback helps others.

 

Thanks!

PCA Error.png

I think the Nearest Neighbor Algorithm is one of the least used, and most powerful algorithms I know of.  It allows me to connect data points with other data points that are similar.  When something is unpredictable, or I simply don't have enough data, this allows me to compare one data point with its nearest neighbors.

 

So, last night I was at school, taking a graduate level Econ course.  We were discussing various distance algorithms for a nearest neighbor algorithm.  Our prof discussed one called the Mahalanobis distance.  It uses some fancy matrix algebra.  Essentially it allows it it to filter out the noise, and only match on distance algorithms that are truly significant.  It takes into account the correlation that may exists within variables, and reduces those variables down to only one.  

 

I use Nearest Neighbor when other things aren't working for me.  When my data sets are weak, sparse, or otherwise not predictable.  Sometimes I don't know that particular variables are correlated.  This is a powerful algorithm that could be added into the Nearest Neighbor, to allow for matches that might not otherwise be found.  And allow matches on only the variables that really matter.  

Hi there,

 

Similar to @aselameab1 - I was having trouble with using the Linear regression tool because it was giving error messages that were not explanatory or self descriptive.

@chadanaber identified the issue - that a specific field only had one unique value which was causing the regression tool to fail - however the error message provided gives no useful or helpful indication that this is the issue.   You can see that the error message below is pretty tough to understand.

 

Could we add an item to the development backlog to add defensive checks to the predictive analytics tools to check for conditions that will cause them to fail, and rework the error messaging?

 

LinearRegressionError.PNGWorkflow.PNG

 

I've attached the workflow with the sample data that replicates this issue

 

Many thanks

Sean

It is important to be able to test for heteroscedasticity, so a tool for this test would be much appreciated.

 

In addition, I strongly believe the ability to calculate robust standard errors should be included as an option in existing regression tools, where applicable. This is a standard feature in most statistical analysis software packages.

 

Many thanks!

More and more applications in R are written with tidyverse code using tidy data principles.  According to rdocumentation.org, tidyverse packages are some of the most downloaded.  Adding this package to the default offering will make it easier to transfer existing R code to Alteryx!

 

Currently the R predictive tools are single thread, which means to utilise multi-threading we need to download separately a third party R package such as Microsoft R Client.

Given this is a better option, should this not be used as the default package upon installation?

 

The capability to input/output R Datasets via the input/output tools, together with all the other data formats as well (like csv, Excel, SAS, SPSS, etc).

This request is largely based on the implementation found on AzureML; (take their free trial and check out the Deep Convolutional and Pooling NN example from their gallery).  This allows you to specify custom convolutional and pooling layers in a deep neural network. This is an extremely powerful machine learning technique that could be tricky to implement, but could perhaps be (for example) a great initial macro wrapped around something in Python, where currently these are more easily implemented than in R.

Idea:

A funcionality added to the Impute values tool for multiple imputation and maximum likelihood imputation of fields with missing at random will be very useful.

 

Rationale:

Missing data form a problem and advanced techniques are complicated. One great idea in statistics is multiple imputation,

filling the gaps in the data not with average, median, mode or user defined static values but instead with plausible values considering other fields.

 

SAS has PROC MI tool, here is a page detailing the usage with examples: http://www.ats.ucla.edu/stat/sas/seminars/missing_data/mi_new_1.htm

Also there is PROC CALIS for maximum likelihood here...

 

Same useful tool exists in spss as well http://www.appliedmissingdata.com/spss-multiple-imputation.pdf

 

Best

randomForest

 

 

Random forest doesn't go well with missing values and create gibberish error results for Alteryx users.

 

Here are two quick options better to add the new version;

1) na.omit, which omits the cases form your data when tere a re missing values... you loose some observations though...

 

na.action=na.omit

 

2) na.roughfixreplaces missing values with mean for continuous and mode for categorical variables

 

na.action=na.roughfix

Best

 

Hello,

 

the randomforest package implementation in Alteryx works fine for smaller datasets but becomes very slow for large datasets with many features.

There is the opensource Ranger package https://arxiv.org/pdf/1508.04409.pdf that could help on this.

 

Along with XGBoost/LightGMB/Catboost it would be an extremely welcome addition to the predictive package!

The existing decision tree node is automatic

but business users need to mingle with the decision tree, prune the tree and grow certain parts of the tree using their domain expertise...

 

SPSS has a nice facility as you can see below... Desperately looking forward for an Alteryx version...

 

 

tree_growtreeSPSS.gif

 

 

 

 

 

 

It feels that lately Alteryx has been focusing on integration rather than adding more machine learning tools, which sadly are still not on par with many competing products...

Personally I miss having XGboost and multi-core random forest libraries like Ranger (along with a more robust implementation of C5.0).

 

What about you guys? Which R/Python libraries are you missing in Alteryx?

It would be great if we could output the coefficients of regression equation to a table so that one can use them in rest of the module. Currently, Alteryx can output the table/coefficients in charts/reports form which is not re-usable as such in the module. 
The values of coefficients/Residuals/Errors would be very useful in building macros for techniques like Missing Value Analysis which can't be done in Alteryx as of now.

Improve Help Documentation or in-tool options for handling null values in statistical tools like Weighted Average or Linear Regression. For instance, checkbox to remove null value records, or at least warn users.

 

In the processing of learning to perform linear regression in RStudio and Alteryx, I came across differing outputs depending on how null values were addressed. Take the Weighted Average tool for example.

 

In R, the weighted.mean function treats null values in the variable of interest as if they were not there. If the user does not specify that null values exist, the result is NA. If any null values exist in the weight field, the result is NA.

 

Since I am more familiar with Alteryx, I originally did the data preparation—including calculating the weighted means—in Alteryx. When comparing these weighted means with those generated in R, I found that Alteryx treats the null values as zeros (i.e. includes them in the calculation). The user would have to know this is incorrect and first filter out the null values. See screenshot examples.

 

 

 

This is also the case within the Linear Regression tool. If null values are not omitted prior to regression, the results are wildly different. Perhaps this is known by more experienced users/statisticians, but this incorrect usage would have gone on unbeknownst to be had I not cross-checked with RStudio.

 

Weighted Average in AlteryxWeighted Average in AlteryxWeighted Mean in RWeighted Mean in R

Many features are in the form of categorical variables. It would be amazing to have a set of tools for clustering and dimensionality reduction on categorical variables.

 

The current tool set in Alteryx is fantastic for working with continuous variables (k-centroids, KNN, PCA), but falls short when working with continuous variables.

 

There are some ways to do dimensionality reduction on categorical variables (Multiple Correspondence Analysis, PCA with gower's distance, etc.) and some ways to cluster categorical variables (k-modes, working with medoids instead of centroids--PAM, etc.).

 

Some key considerations on which algorithm to use are time complexity, validity of results, and whether the algorithm can work on variables that are only categorical, or both categorical and continuous.

 

- Michael Dyatchenko

 

I made a search on LDA - Linear Discriminant Analysis on Alteryx Help and it returned "0" Results.

 

Altryx LDA.jpg

 

Idea: LDA - Linear Discriminant Analysis tool

to be added on the predictive tool box.

 

 

 

Rationale: We have PCA and MDS as tools which help a lot on "unsupervised" dimentionality reduction in predictive modelling.

Bu if we need a method that takes target values into considerations we need a "supervised" tool instead...

 

Altryx LDA2.jpg

 

 

"LDA is also closely related to principal component analysis (PCA) and factor analysis in that they both look for linear combinations of variables which best explain the data.[4] LDA explicitly attempts to model the difference between the classes of data. PCA on the other hand does not take into account any difference in class, and factor analysis builds the feature combinations based on differences rather than similarities. Discriminant analysis is also different from factor analysis in that it is not an interdependence technique: a distinction between independent variables and dependent variables (also called criterion variables) must be made."

Idea:

Some well known scoring methods use optimal binned variables for added robustness. Let's add this capability to Alteryx.

 

Retionale:

Here's a basic link on why to do that; http://documents.software.dell.com/statistics/textbook/optimal-binning

 

Current status in Alterys as I'm aware of:

Tile tool or Multi-field Binning tool for completing same task as Tile tool on multiple fields, splits the variables by 5 methods;

  • Equal Records or Intervals or Sums

  • Smart Tile

  • Unique Value 

  • Manual

Unfortunately "equal something" binnings are bad idea, as the values are categorized "blindly" irrespective of the effects on the predictive power of the models. 

 

What to do:

What's needed is to bin both numerical and categorical variables optimally such that the Weights of Evidences (WoE) should present a monotone increasing or decreasing pattern. Maybe at most a V or U shaped "convex" structure.

 

Quick win:

Without constraining ourselves with monotonicity or convex cases, the easiest practice would be running a C4.5 or CHAID tree algorithm (produces multiple splits rather than binary splits in CART) for a single variable and select the target as the dependent variable and all the resulting nodes will be the bins we are looking for. Doing this for multiple variables at once is the key to the tool to be generated.

 

Clients:

This capability is sought by risk management departments building robust, stable Basel compliant models in financial industry, especially by banks.

Top Liked Authors