The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Featured Ideas

Is there a reason why Alteryx does not include hierarchical clustering?

 

Well it's sort of slow especially with huge data sets, computation effort increases cubic, but then when you need to do two step clustering,

"creating more than enough k-means clusters and joining cluster centers with hierarchical clustering" it seems to be a must...

 

P.s. Knime, SPSS modeler, SAS, Rapidminer has it already...

0 Likes

There is a great functionality in Excel that lets users "seek" a value that makes whatever chain of formulas you might have work out to a given value. Here's what Microsoft explains about goal seek: https://support.office.com/en-us/article/Use-Goal-Seek-to-find-a-result-by-adjusting-an-input-value-...

 

My specific example was this:

 

In the excel (attached), all you have to do is click on the highlighted blue cell, select the “data” tab up top and then “What-if analysis” and finally “goal seek.” Then you set the dialogue box up to look like this:

 Set cell: G9

To Value: 330

By changing cell" J6

 

And hit “Okay.” Excel then iteratively finds the value for the cell J6 that makes the cell G9 equal 330. Can I build a module that will do the same thing? I’m figuring I wouldn’t have to do it iteratively, if I could build the right series of formulas/commands. You can see what I’m trying to accomplish in the formulas I’ve built in Excel, but essentially I’m trying to build a model that will tell me what the % Adjustment rate should be for the other groups when I’ve picked the first adjustment rate, and the others need to change proportionally to their contribution to the remaining volume.

 

There doesn't really seem to be a way to do this in Alteryx that I can see. I hate to think there is something that excel can do that Alteryx can't!

This request is largely based on the implementation found on AzureML; (take their free trial and check out the Deep Convolutional and Pooling NN example from their gallery).  This allows you to specify custom convolutional and pooling layers in a deep neural network. This is an extremely powerful machine learning technique that could be tricky to implement, but could perhaps be (for example) a great initial macro wrapped around something in Python, where currently these are more easily implemented than in R.

I would like to suggest to add a widget which encapsulate an R script able to perform outlier detection, something similar like netflix did:

 

Thank you.

 

Regards,

Cristian

  

XGboost regression is now the benchmark for every Kaggle competition and seems to consistently outperform random forest, spline regression, and all of the more basic models. For those of us using predictive modeling on a regular basis in our actual work, this tool would allow for a quick improvement in our model accuracy. And I think, from a marketing standpoint, having a core group of users competing in Kaggle using Alteryx would be a great way to show off Alteryx's power.

 

It is readily available as an R package: https://cran.r-project.org/web/packages/xgboost/index.html

Hello! Almost all statistical softwares allow for the analyst to use either a pairwise or a listwise option when applying clustering techinques. This option affects only how the inner distance matrix is built, and after that whichever algorithm you choose is peformed. However in Alteryx [K-Centroids] by default does listwise, classifying only those records where the selected variables have no nulls.

 

Please consider adding this option!

 

PS: the difference is pairwise will build the distance between 2 variables depending on those records that have no nulls on both variables, while listwise will run the distance matrix after it has checked for complete non null records in all variables of interest (not one at a time distance calculation).

I am trying to run batch regressions on a pretty sizable set of data.  About ~1M distinct groups of data, each wtih 30-500 x,y pairs.

 

A batch macro with a linear regression works ok - but it is really slow.  Started at about 2-3s per regression.  After stripping out bunch or reporting from the macro, I am down to ~2s.  This is still feels quite slow compared to something purpose built.

 

Has anyone experimented with higher speed versions that just dump out m,b, & r2?

Idea:

A funcionality added to the Impute values tool for multiple imputation and maximum likelihood imputation of fields with missing at random will be very useful.

 

Rationale:

Missing data form a problem and advanced techniques are complicated. One great idea in statistics is multiple imputation,

filling the gaps in the data not with average, median, mode or user defined static values but instead with plausible values considering other fields.

 

SAS has PROC MI tool, here is a page detailing the usage with examples: http://www.ats.ucla.edu/stat/sas/seminars/missing_data/mi_new_1.htm

Also there is PROC CALIS for maximum likelihood here...

 

Same useful tool exists in spss as well http://www.appliedmissingdata.com/spss-multiple-imputation.pdf

 

Best

There is a web hosted trial that anyone can have a hands on experiance with alteryx tutorials  without even downoading the tool.

That's awesome... http://goo.gl/dpSoe2

 

It may be a nice idea to;

 

1) either start seperate "Alteryx-kaggle" instances with data sets specific to each kaggle competition so that anyone want to try out may have a go with those well known examples thru the Alteryx site, 

2) Or even better have a partnership with kaggle so that anyone can just have it's own Alteryx trial per specific competition on the kaggle website...

 

I'm sure this will draw a lot of attention...

 

Rationale;

You'll immediately have a greater reach in Kaggle community, some data hobbiyists and cs, ie students and acedemics (which will eventually end up doing lot's of data blending when ther are going to be hired by top notch firms...

Idea:

In forecasting and in commercial/sme risk scoring there is a need for trying vast number of algebraic equations which is a very cumbersome prosess. Let's add symbolic regression as a new competitive capability.

 

Rationale:

Summations, ratios, power transforms and all combinations of a like are needed to be tested as new variables for a forecasting or prediction model. Doing this by hand manually is a though and long business... And there is always a possibility for one to skip a valuable combination.

 

 

Symbolic regression is a novel techinique for automatically generating algebraic equations with use of genetic programming
In every evolution a variable is selected checked if the equation is discriminatitive of the target variable at hand. In every next step frequently observed variables will be selected more likely.

 

SR comparison with linear regression neural nets and random forestsSR comparison with linear regression neural nets and random forests

 

Benefit for clients:

This method produces variables mainly with nonlinear relationships. It is a technique that will help in corporate/commercial/sme risk modelling, such that powerful risk models are generated from a hort list of B/S and P/L based algebraic equations.
There is potential use cases in algorithmic trading as well...

 

There are 3 very interesting world problems solved with symbolic regression here.

A very relevant thesis by sean Wouter is attached as a pdf document for your reading pleasure...

 

R side of things:

I've found Rgp package for genetic programming, here is a link.

 

Competition:

I haven't seen something similar in SAS, SPSS but there is this; http://www.nutonian.com/products/eureqa/

Also there is Bruce Ratner's page

 

Idea:

Some well known scoring methods use optimal binned variables for added robustness. Let's add this capability to Alteryx.

 

Retionale:

Here's a basic link on why to do that; http://documents.software.dell.com/statistics/textbook/optimal-binning

 

Current status in Alterys as I'm aware of:

Tile tool or Multi-field Binning tool for completing same task as Tile tool on multiple fields, splits the variables by 5 methods;

  • Equal Records or Intervals or Sums

  • Smart Tile

  • Unique Value 

  • Manual

Unfortunately "equal something" binnings are bad idea, as the values are categorized "blindly" irrespective of the effects on the predictive power of the models. 

 

What to do:

What's needed is to bin both numerical and categorical variables optimally such that the Weights of Evidences (WoE) should present a monotone increasing or decreasing pattern. Maybe at most a V or U shaped "convex" structure.

 

Quick win:

Without constraining ourselves with monotonicity or convex cases, the easiest practice would be running a C4.5 or CHAID tree algorithm (produces multiple splits rather than binary splits in CART) for a single variable and select the target as the dependent variable and all the resulting nodes will be the bins we are looking for. Doing this for multiple variables at once is the key to the tool to be generated.

 

Clients:

This capability is sought by risk management departments building robust, stable Basel compliant models in financial industry, especially by banks.

The capability to input/output R Datasets via the input/output tools, together with all the other data formats as well (like csv, Excel, SAS, SPSS, etc).

When scoring data if you have values in predictor fields not seen in the data that was used to build the model the score tool will not score the record.  Makes sense but it would be nice to know how impactful the issue is.  Please provide a count of records not scored for these reasons as well as a count of records not scored because of exceeding the limit in the configuration tab of the score tool. and a count for any other reason a record is not scored so we have a clear understanding of how many were scored and how many were not and why.


 
It would be great if we could output the coefficients of regression equation to a table so that one can use them in rest of the module. Currently, Alteryx can output the table/coefficients in charts/reports form which is not re-usable as such in the module. 
The values of coefficients/Residuals/Errors would be very useful in building macros for techniques like Missing Value Analysis which can't be done in Alteryx as of now.
Hello, 

I am not sure if this capability exists but I assume it does not.

We have a need to optimize a Linear Program (LP) model that consists of a system of equations and has both:
An objective function and a series of constraints. One of the software capabilities that SAS offers that currently
Alteryx does not have is this optimization capability. 

I am wondering if the capability is currently not available, is this capability in the Product Roadmap?

Thanks,

Ricardo

I have been using the outputs from Spline Regression to facillitate analysis of demographic data (specifically Department of Labor Quarterly Employment data).  I have data from 1992Q1 to 2014Q1 and use Spline Regression to get fitted values for each quarter with predictors being the year/quarter, Year/quarter multiplied by a dummy variable for each of the 4 US Presidents, and a dummy variable for each president. 
So I can compare results across various groupings by geographic, and other levels as well as the BLS aggregation level. I can analyze raw data or have the values to be fitted indexed to 1992Q1.
I use the default settings for Spline and it builds the best fit including where the node periods for each spline section.  To help interpret the results, though, I use the output to compare the actual vs. fitted values (e.g. employment Level) and then look at the changes by quarter. 
With the spline regression building the best model with optimal line segments, the results make it possible to see how employment progress or regress correletat with with presidential terms of office or specific impacts of economic recessions on employment data.

I can supply an example of the process, if anyone is interested.

I'd appreciate any comments and/or suggestions to improve the process or interpret the results.  

Top Liked Authors