community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Ideas

Share your Designer product ideas - we're listening!

1 Review

Our submission guidelines & status definitions before getting started

2 Search

The community for a solution or existing idea before posting

3 Vote

By clicking the star in the top left corner of an idea you support

4 Submit

A new idea to suggest a product enhancement or new feature


Suggest an idea

Unsupervised learning method to detect topics in a text document.

 

Helpful for users interested in text mining.

I think the Nearest Neighbor Algorithm is one of the least used, and most powerful algorithms I know of.  It allows me to connect data points with other data points that are similar.  When something is unpredictable, or I simply don't have enough data, this allows me to compare one data point with its nearest neighbors.

 

So, last night I was at school, taking a graduate level Econ course.  We were discussing various distance algorithms for a nearest neighbor algorithm.  Our prof discussed one called the Mahalanobis distance.  It uses some fancy matrix algebra.  Essentially it allows it it to filter out the noise, and only match on distance algorithms that are truly significant.  It takes into account the correlation that may exists within variables, and reduces those variables down to only one.  

 

I use Nearest Neighbor when other things aren't working for me.  When my data sets are weak, sparse, or otherwise not predictable.  Sometimes I don't know that particular variables are correlated.  This is a powerful algorithm that could be added into the Nearest Neighbor, to allow for matches that might not otherwise be found.  And allow matches on only the variables that really matter.  

Hello,

 

the randomforest package implementation in Alteryx works fine for smaller datasets but becomes very slow for large datasets with many features.

There is the opensource Ranger package https://arxiv.org/pdf/1508.04409.pdf that could help on this.

 

Along with XGBoost/LightGMB/Catboost it would be an extremely welcome addition to the predictive package!

I think the Nearest Neighbor Algorithm is one of the least used, and most powerful algorithms I know of.  It allows me to connect data points with other data points that are similar.  When something is unpredictable, or I simply don't have enough data, this allows me to compare one data point with its nearest neighbors.

 

So, last night I was at school, taking a graduate level Econ course.  We were discussing various distance algorithms for a nearest neighbor algorithm.  Our prof discussed one called the Mahalanobis distance.  It uses some fancy matrix algebra.  Essentially it allows it it to filter out the noise, and only match on distance algorithms that are truly significant.  It takes into account the correlation that may exists within variables, and reduces those variables down to only one.  

 

I use Nearest Neighbor when other things aren't working for me.  When my data sets are weak, sparse, or otherwise not predictable.  Sometimes I don't know that particular variables are correlated.  This is a powerful algorithm that could be added into the Nearest Neighbor, to allow for matches that might not otherwise be found.  And allow matches on only the variables that really matter.  

A lot of popular machine learning systems use a computer's GPU to speed up some of the math to a huge degree. The header on this article on Medium shows a 15x difference from a high-end CPU vs a high-end GPU. It could also create an improvement in the spatial tools. Perhaps Alteryx should add this functionality in order to speed up these tools, which I can imagine are currently some of the slowest.

This idea arose recently when working specifically with the Association Analysis tool, but I have a feeling that other predictive tools could benefit as well.  I was trying to run an association analysis for a large number of variables, but when I was investigating the output using the new interactive tools, I was presented with something similar to this:

 

CorrelationPlot.PNG

 

While the correlation plot draws your high to high associations, the user is unable to read the field names, and the tooltip only provides the correlation value rather than the fields with the value.  As such, I shifted my attention to the report output, which looked like this:

 

CorrelationTable.PNG

 

While I could now read everything, it made pulling out the insights much more difficult.  Wanting the best of both worlds, I decided to extract the correlation table from the R output and drop it into Tableau for a filterable, interactive version of the correlation matrix.  This turned out to be much easier said than done.  Because the R output comes in report form, I tried to use the report extract macros mentioned in this thread to pull out the actual values.  This was an issue due to the report formatting, so instead I cracked open the macro to extract the data directly from the R output.  To make a long story shorter, this ended up being problematic due to report formats, batch macro pathing, and an unidentifiable bug.  

 

In the end, it would be great if there was a “Data” output for reports from certain predictive tools that would benefit from further analysis. While the reports and interactive outputs are great for ingesting small model outputs, at times there is a need to extract the data itself for further analysis/visualization.  This is one example, as is the model coefficients from regression analyses that I have used in the past.  I know Dr. Dan created a model coefficients macro for the case of regression, but I have to imagine that there are other cases where the data is desired along with the report/interactive output.

 

It would be nice if this option would take you to the correct download page relative to the version the user has installed. Currently, this always loads the download page for the current version which is confusing for users of a company who are still required to use an older version.

 

image.png 

A lot of popular machine learning systems use a computer's GPU to speed up some of the math to a huge degree. The header on this article on Medium shows a 15x difference from a high-end CPU vs a high-end GPU. It could also create an improvement in the spatial tools. Perhaps Alteryx should add this functionality in order to speed up these tools, which I can imagine are currently some of the slowest.

Python pandas dataframes and data types (numpy arrays, lists, dictionaries, etc.) are much more robust in general than their counterparts in R, and they play together much easier as well. Moreover, there are only a handful of packages that do everything a data scientist would need, including graphing, such as SciKit Learn, Pandas, Numpy, and Seaborn. After utliizing R, Python, and Alteryx, I'm still a big proponent of integrating with the Python language much like Alteryx has integrated with R. At the very least, I propose to create the ability to create custom code such as a Python tool. 

It would be nice if this option would take you to the correct download page relative to the version the user has installed. Currently, this always loads the download page for the current version which is confusing for users of a company who are still required to use an older version.

 

image.png 

Python pandas dataframes and data types (numpy arrays, lists, dictionaries, etc.) are much more robust in general than their counterparts in R, and they play together much easier as well. Moreover, there are only a handful of packages that do everything a data scientist would need, including graphing, such as SciKit Learn, Pandas, Numpy, and Seaborn. After utliizing R, Python, and Alteryx, I'm still a big proponent of integrating with the Python language much like Alteryx has integrated with R. At the very least, I propose to create the ability to create custom code such as a Python tool. 

Would be extremely useful if the Summarize Tool had an option in the numeric menu to Standardize the data.  More often than not, data sets will not have the same count of variables which makes the comparison analysis meaningless.  Currently, there is no easy way to Standardize the data without using the K-Centroids Cluster Analysis tool or standardize_unit interval supporting macro. 

Up to version 10.0 I could open pretty much all analytics tools as a macro, to tweak things in R or in the macro workflow to get the results in a way most useful to us.

 

But apparently with Alteryx 11.0 the newer tools does not have that option, Although we can still access the older versions of those tools and still open them as macro but I don't understand (may be because they have interactive report option) why that is being killed in the newer versions? 

 

Most of the newer versions have new features, like Linear Regression now support elastic net and cross validation etc.. but I still want to be able to go in to them to tweak them.

Would be extremely useful if the Summarize Tool had an option in the numeric menu to Standardize the data.  More often than not, data sets will not have the same count of variables which makes the comparison analysis meaningless.  Currently, there is no easy way to Standardize the data without using the K-Centroids Cluster Analysis tool or standardize_unit interval supporting macro. 

So - with Challenge 111 - many folk used the Optimization tool

https://community.alteryx.com/t5/Weekly-Challenge/Challenge-111-Make-a-Weekly-Challenge-Dream-Team/m...

… and Joe has done a great training on this here

https://community.alteryx.com/t5/Live-Training/Live-Training-Prescriptive-Optimization/m-p/44779

 

But it's still to hard to use.   It requires you to have pre-knowledge of a bunch of parameters and different types of knowledge.

 

Can we improve the interface on this tool so that it can be used by folk who do not have a background in R - for example, take all the different inputs, and make them parameterized on drop-down boxes or input boxes on the tool?

 

Thank you all

S

 

CC: @JoeM

  • Category Predictive

It is important to be able to test for heteroscedasticity, so a tool for this test would be much appreciated.

 

In addition, I strongly believe the ability to calculate robust standard errors should be included as an option in existing regression tools, where applicable. This is a standard feature in most statistical analysis software packages.

 

Many thanks!

The existing decision tree node is automatic

but business users need to mingle with the decision tree, prune the tree and grow certain parts of the tree using their domain expertise...

 

SPSS has a nice facility as you can see below... Desperately looking forward for an Alteryx version...

 

 

tree_growtreeSPSS.gif

 

 

 

 

 

 

  • Category Predictive

It is important to be able to test for heteroscedasticity, so a tool for this test would be much appreciated.

 

In addition, I strongly believe the ability to calculate robust standard errors should be included as an option in existing regression tools, where applicable. This is a standard feature in most statistical analysis software packages.

 

Many thanks!

randomForest

 

 

Random forest doesn't go well with missing values and create gibberish error results for Alteryx users.

 

Here are two quick options better to add the new version;

1) na.omit, which omits the cases form your data when tere a re missing values... you loose some observations though...

 

na.action=na.omit

 

2) na.roughfixreplaces missing values with mean for continuous and mode for categorical variables

 

na.action=na.roughfix

Best

 

So - with Challenge 111 - many folk used the Optimization tool

https://community.alteryx.com/t5/Weekly-Challenge/Challenge-111-Make-a-Weekly-Challenge-Dream-Team/m...

… and Joe has done a great training on this here

https://community.alteryx.com/t5/Live-Training/Live-Training-Prescriptive-Optimization/m-p/44779

 

But it's still to hard to use.   It requires you to have pre-knowledge of a bunch of parameters and different types of knowledge.

 

Can we improve the interface on this tool so that it can be used by folk who do not have a background in R - for example, take all the different inputs, and make them parameterized on drop-down boxes or input boxes on the tool?

 

Thank you all

S

 

CC: @JoeM

  • Category Predictive
Top Starred Authors