Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Featured Ideas

When working with R code and errors occur, the application needs to show which line the error happened on.

Designer should support statistical testing tools that ignore data distribution and support Statistical Learning methods.

 

Alteryx already supports resampling for predictive modeling with Cross-Validation.

 

Resampling tools for bootstrap and permutation tests (supporting with or without replacement) should be tools for analysts and data scientists alike that assess random variability in a statistic without needing to worry about the restrictions of the data's distribution, as is the case with many parametric tests, most commonly supported by the t-test Tool in Alteryx. With modern computing power the need for hundred-year-old statistical sampling testing is fading: the power to sample a data set thousands of times to compare results to random chance is much easier today.

 

The tool's results could include, like R, outputs of not only the results histogram but the associated Q-Q plot that visualizes the distribution of the data for the analyst. This would duplicate the Distribution Analysis tool somewhat, but the Q-Q plot is, to me, a major missing element in the simplest visualization of data. This tool could be very valuable in terms of feeding the A/B Test tools.

I would like to suggest to add a widget which encapsulate an R script able to perform outlier detection, something similar like netflix did:

 

Thank you.

 

Regards,

Cristian

  

A lot of popular machine learning systems use a computer's GPU to speed up some of the math to a huge degree. The header on this article on Medium shows a 15x difference from a high-end CPU vs a high-end GPU. It could also create an improvement in the spatial tools. Perhaps Alteryx should add this functionality in order to speed up these tools, which I can imagine are currently some of the slowest.

More and more applications in R are written with tidyverse code using tidy data principles.  According to rdocumentation.org, tidyverse packages are some of the most downloaded.  Adding this package to the default offering will make it easier to transfer existing R code to Alteryx!

 

I would like to share some feedback regarding the Principal Component tool.

I've selected the option "Scale each field to have unit variance" and 1 of the 4 PCA tools was displaying errors. However, the error message is not very intuitive and I couldn't use it to debug my workflow. The problem was that for my type of data, scaling could not be applied since it had a lot of 0 values.

Couldn't find anything related to this, so hope my feedback helps others.

 

Thanks!

PCA Error.png

Hi there,

 

Similar to @aselameab1 - I was having trouble with using the Linear regression tool because it was giving error messages that were not explanatory or self descriptive.

@chadanaber identified the issue - that a specific field only had one unique value which was causing the regression tool to fail - however the error message provided gives no useful or helpful indication that this is the issue.   You can see that the error message below is pretty tough to understand.

 

Could we add an item to the development backlog to add defensive checks to the predictive analytics tools to check for conditions that will cause them to fail, and rework the error messaging?

 

LinearRegressionError.PNGWorkflow.PNG

 

I've attached the workflow with the sample data that replicates this issue

 

Many thanks

Sean

It is important to be able to test for heteroscedasticity, so a tool for this test would be much appreciated.

 

In addition, I strongly believe the ability to calculate robust standard errors should be included as an option in existing regression tools, where applicable. This is a standard feature in most statistical analysis software packages.

 

Many thanks!

I think the Nearest Neighbor Algorithm is one of the least used, and most powerful algorithms I know of.  It allows me to connect data points with other data points that are similar.  When something is unpredictable, or I simply don't have enough data, this allows me to compare one data point with its nearest neighbors.

 

So, last night I was at school, taking a graduate level Econ course.  We were discussing various distance algorithms for a nearest neighbor algorithm.  Our prof discussed one called the Mahalanobis distance.  It uses some fancy matrix algebra.  Essentially it allows it it to filter out the noise, and only match on distance algorithms that are truly significant.  It takes into account the correlation that may exists within variables, and reduces those variables down to only one.  

 

I use Nearest Neighbor when other things aren't working for me.  When my data sets are weak, sparse, or otherwise not predictable.  Sometimes I don't know that particular variables are correlated.  This is a powerful algorithm that could be added into the Nearest Neighbor, to allow for matches that might not otherwise be found.  And allow matches on only the variables that really matter.  

Currently the R predictive tools are single thread, which means to utilise multi-threading we need to download separately a third party R package such as Microsoft R Client.

Given this is a better option, should this not be used as the default package upon installation?

 

Idea:

A funcionality added to the Impute values tool for multiple imputation and maximum likelihood imputation of fields with missing at random will be very useful.

 

Rationale:

Missing data form a problem and advanced techniques are complicated. One great idea in statistics is multiple imputation,

filling the gaps in the data not with average, median, mode or user defined static values but instead with plausible values considering other fields.

 

SAS has PROC MI tool, here is a page detailing the usage with examples: http://www.ats.ucla.edu/stat/sas/seminars/missing_data/mi_new_1.htm

Also there is PROC CALIS for maximum likelihood here...

 

Same useful tool exists in spss as well http://www.appliedmissingdata.com/spss-multiple-imputation.pdf

 

Best

The capability to input/output R Datasets via the input/output tools, together with all the other data formats as well (like csv, Excel, SAS, SPSS, etc).

randomForest

 

 

Random forest doesn't go well with missing values and create gibberish error results for Alteryx users.

 

Here are two quick options better to add the new version;

1) na.omit, which omits the cases form your data when tere a re missing values... you loose some observations though...

 

na.action=na.omit

 

2) na.roughfixreplaces missing values with mean for continuous and mode for categorical variables

 

na.action=na.roughfix

Best

 

This request is largely based on the implementation found on AzureML; (take their free trial and check out the Deep Convolutional and Pooling NN example from their gallery).  This allows you to specify custom convolutional and pooling layers in a deep neural network. This is an extremely powerful machine learning technique that could be tricky to implement, but could perhaps be (for example) a great initial macro wrapped around something in Python, where currently these are more easily implemented than in R.

 

I made a search on LDA - Linear Discriminant Analysis on Alteryx Help and it returned "0" Results.

 

Altryx LDA.jpg

 

Idea: LDA - Linear Discriminant Analysis tool

to be added on the predictive tool box.

 

 

 

Rationale: We have PCA and MDS as tools which help a lot on "unsupervised" dimentionality reduction in predictive modelling.

Bu if we need a method that takes target values into considerations we need a "supervised" tool instead...

 

Altryx LDA2.jpg

 

 

"LDA is also closely related to principal component analysis (PCA) and factor analysis in that they both look for linear combinations of variables which best explain the data.[4] LDA explicitly attempts to model the difference between the classes of data. PCA on the other hand does not take into account any difference in class, and factor analysis builds the feature combinations based on differences rather than similarities. Discriminant analysis is also different from factor analysis in that it is not an interdependence technique: a distinction between independent variables and dependent variables (also called criterion variables) must be made."

The existing decision tree node is automatic

but business users need to mingle with the decision tree, prune the tree and grow certain parts of the tree using their domain expertise...

 

SPSS has a nice facility as you can see below... Desperately looking forward for an Alteryx version...

 

 

tree_growtreeSPSS.gif

 

 

 

 

 

 

Hello,

 

the randomforest package implementation in Alteryx works fine for smaller datasets but becomes very slow for large datasets with many features.

There is the opensource Ranger package https://arxiv.org/pdf/1508.04409.pdf that could help on this.

 

Along with XGBoost/LightGMB/Catboost it would be an extremely welcome addition to the predictive package!

It would be great if we could output the coefficients of regression equation to a table so that one can use them in rest of the module. Currently, Alteryx can output the table/coefficients in charts/reports form which is not re-usable as such in the module. 
The values of coefficients/Residuals/Errors would be very useful in building macros for techniques like Missing Value Analysis which can't be done in Alteryx as of now.

It feels that lately Alteryx has been focusing on integration rather than adding more machine learning tools, which sadly are still not on par with many competing products...

Personally I miss having XGboost and multi-core random forest libraries like Ranger (along with a more robust implementation of C5.0).

 

What about you guys? Which R/Python libraries are you missing in Alteryx?

Improve Help Documentation or in-tool options for handling null values in statistical tools like Weighted Average or Linear Regression. For instance, checkbox to remove null value records, or at least warn users.

 

In the processing of learning to perform linear regression in RStudio and Alteryx, I came across differing outputs depending on how null values were addressed. Take the Weighted Average tool for example.

 

In R, the weighted.mean function treats null values in the variable of interest as if they were not there. If the user does not specify that null values exist, the result is NA. If any null values exist in the weight field, the result is NA.

 

Since I am more familiar with Alteryx, I originally did the data preparation—including calculating the weighted means—in Alteryx. When comparing these weighted means with those generated in R, I found that Alteryx treats the null values as zeros (i.e. includes them in the calculation). The user would have to know this is incorrect and first filter out the null values. See screenshot examples.

 

 

 

This is also the case within the Linear Regression tool. If null values are not omitted prior to regression, the results are wildly different. Perhaps this is known by more experienced users/statisticians, but this incorrect usage would have gone on unbeknownst to be had I not cross-checked with RStudio.

 

Weighted Average in AlteryxWeighted Average in AlteryxWeighted Mean in RWeighted Mean in R

Top Liked Authors