community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Ideas

Share your Designer product ideas - we're listening!

1 Review

Our submission guidelines & status definitions before getting started

2 Search

The community for a solution or existing idea before posting

3 Vote

By clicking the star in the top left corner of an idea you support

4 Submit

A new idea to suggest a product enhancement or new feature


Suggest an idea

Unsupervised learning method to detect topics in a text document.

 

Helpful for users interested in text mining.

I think the Nearest Neighbor Algorithm is one of the least used, and most powerful algorithms I know of.  It allows me to connect data points with other data points that are similar.  When something is unpredictable, or I simply don't have enough data, this allows me to compare one data point with its nearest neighbors.

 

So, last night I was at school, taking a graduate level Econ course.  We were discussing various distance algorithms for a nearest neighbor algorithm.  Our prof discussed one called the Mahalanobis distance.  It uses some fancy matrix algebra.  Essentially it allows it it to filter out the noise, and only match on distance algorithms that are truly significant.  It takes into account the correlation that may exists within variables, and reduces those variables down to only one.  

 

I use Nearest Neighbor when other things aren't working for me.  When my data sets are weak, sparse, or otherwise not predictable.  Sometimes I don't know that particular variables are correlated.  This is a powerful algorithm that could be added into the Nearest Neighbor, to allow for matches that might not otherwise be found.  And allow matches on only the variables that really matter.  

I think the Nearest Neighbor Algorithm is one of the least used, and most powerful algorithms I know of.  It allows me to connect data points with other data points that are similar.  When something is unpredictable, or I simply don't have enough data, this allows me to compare one data point with its nearest neighbors.

 

So, last night I was at school, taking a graduate level Econ course.  We were discussing various distance algorithms for a nearest neighbor algorithm.  Our prof discussed one called the Mahalanobis distance.  It uses some fancy matrix algebra.  Essentially it allows it it to filter out the noise, and only match on distance algorithms that are truly significant.  It takes into account the correlation that may exists within variables, and reduces those variables down to only one.  

 

I use Nearest Neighbor when other things aren't working for me.  When my data sets are weak, sparse, or otherwise not predictable.  Sometimes I don't know that particular variables are correlated.  This is a powerful algorithm that could be added into the Nearest Neighbor, to allow for matches that might not otherwise be found.  And allow matches on only the variables that really matter.  

A lot of popular machine learning systems use a computer's GPU to speed up some of the math to a huge degree. The header on this article on Medium shows a 15x difference from a high-end CPU vs a high-end GPU. It could also create an improvement in the spatial tools. Perhaps Alteryx should add this functionality in order to speed up these tools, which I can imagine are currently some of the slowest.

A lot of popular machine learning systems use a computer's GPU to speed up some of the math to a huge degree. The header on this article on Medium shows a 15x difference from a high-end CPU vs a high-end GPU. It could also create an improvement in the spatial tools. Perhaps Alteryx should add this functionality in order to speed up these tools, which I can imagine are currently some of the slowest.

XGboost regression is now the benchmark for every Kaggle competition and seems to consistently outperform random forest, spline regression, and all of the more basic models. For those of us using predictive modeling on a regular basis in our actual work, this tool would allow for a quick improvement in our model accuracy. And I think, from a marketing standpoint, having a core group of users competing in Kaggle using Alteryx would be a great way to show off Alteryx's power.

 

It is readily available as an R package: https://cran.r-project.org/web/packages/xgboost/index.html

I checked out the "Boosted" model and see that it basically wraps the "gbm" model in R.  I would like to request a similar wrapping for the newer xgb (or xgboost) -- eXtreme Gradient Boosting, which is very fast and accurate, and is winning Kaggle competitions left and right.  It would be a great addition and is something SAS probably won't have it for another 10 years, if ever.

 

  • Predictive

XGboost regression is now the benchmark for every Kaggle competition and seems to consistently outperform random forest, spline regression, and all of the more basic models. For those of us using predictive modeling on a regular basis in our actual work, this tool would allow for a quick improvement in our model accuracy. And I think, from a marketing standpoint, having a core group of users competing in Kaggle using Alteryx would be a great way to show off Alteryx's power.

 

It is readily available as an R package: https://cran.r-project.org/web/packages/xgboost/index.html

I checked out the "Boosted" model and see that it basically wraps the "gbm" model in R.  I would like to request a similar wrapping for the newer xgb (or xgboost) -- eXtreme Gradient Boosting, which is very fast and accurate, and is winning Kaggle competitions left and right.  It would be a great addition and is something SAS probably won't have it for another 10 years, if ever.

 

  • Predictive

The capability to input/output R Datasets via the input/output tools, together with all the other data formats as well (like csv, Excel, SAS, SPSS, etc).

It would be great if we could output the coefficients of regression equation to a table so that one can use them in rest of the module. Currently, Alteryx can output the table/coefficients in charts/reports form which is not re-usable as such in the module. 
The values of coefficients/Residuals/Errors would be very useful in building macros for techniques like Missing Value Analysis which can't be done in Alteryx as of now.
  • Predictive

This request is largely based on the implementation found on AzureML; (take their free trial and check out the Deep Convolutional and Pooling NN example from their gallery).  This allows you to specify custom convolutional and pooling layers in a deep neural network. This is an extremely powerful machine learning technique that could be tricky to implement, but could perhaps be (for example) a great initial macro wrapped around something in Python, where currently these are more easily implemented than in R.

  • Predictive
It would be great if we could output the coefficients of regression equation to a table so that one can use them in rest of the module. Currently, Alteryx can output the table/coefficients in charts/reports form which is not re-usable as such in the module. 
The values of coefficients/Residuals/Errors would be very useful in building macros for techniques like Missing Value Analysis which can't be done in Alteryx as of now.
  • Predictive

I would like to suggest to add a widget which encapsulate an R script able to perform outlier detection, something similar like netflix did:

 

Thank you.

 

Regards,

Cristian

  

I would like to suggest to add a widget which encapsulate an R script able to perform outlier detection, something similar like netflix did:

 

Thank you.

 

Regards,

Cristian

  

Idea:

A funcionality added to the Impute values tool for multiple imputation and maximum likelihood imputation of fields with missing at random will be very useful.

 

Rationale:

Missing data form a problem and advanced techniques are complicated. One great idea in statistics is multiple imputation,

filling the gaps in the data not with average, median, mode or user defined static values but instead with plausible values considering other fields.

 

SAS has PROC MI tool, here is a page detailing the usage with examples: http://www.ats.ucla.edu/stat/sas/seminars/missing_data/mi_new_1.htm

Also there is PROC CALIS for maximum likelihood here...

 

Same useful tool exists in spss as well http://www.appliedmissingdata.com/spss-multiple-imputation.pdf

 

Best

Hello, 

I am not sure if this capability exists but I assume it does not.

We have a need to optimize a Linear Program (LP) model that consists of a system of equations and has both:
An objective function and a series of constraints. One of the software capabilities that SAS offers that currently
Alteryx does not have is this optimization capability. 

I am wondering if the capability is currently not available, is this capability in the Product Roadmap?

Thanks,

Ricardo
  • Predictive

Idea:

A funcionality added to the Impute values tool for multiple imputation and maximum likelihood imputation of fields with missing at random will be very useful.

 

Rationale:

Missing data form a problem and advanced techniques are complicated. One great idea in statistics is multiple imputation,

filling the gaps in the data not with average, median, mode or user defined static values but instead with plausible values considering other fields.

 

SAS has PROC MI tool, here is a page detailing the usage with examples: http://www.ats.ucla.edu/stat/sas/seminars/missing_data/mi_new_1.htm

Also there is PROC CALIS for maximum likelihood here...

 

Same useful tool exists in spss as well http://www.appliedmissingdata.com/spss-multiple-imputation.pdf

 

Best

Hello, 

I am not sure if this capability exists but I assume it does not.

We have a need to optimize a Linear Program (LP) model that consists of a system of equations and has both:
An objective function and a series of constraints. One of the software capabilities that SAS offers that currently
Alteryx does not have is this optimization capability. 

I am wondering if the capability is currently not available, is this capability in the Product Roadmap?

Thanks,

Ricardo
  • Predictive

This request is largely based on the implementation found on AzureML; (take their free trial and check out the Deep Convolutional and Pooling NN example from their gallery).  This allows you to specify custom convolutional and pooling layers in a deep neural network. This is an extremely powerful machine learning technique that could be tricky to implement, but could perhaps be (for example) a great initial macro wrapped around something in Python, where currently these are more easily implemented than in R.

  • Predictive
Top Starred Authors