Be sure to review our Idea Submission Guidelines for more information!
Submission GuidelinesHello,
After used the new "Image Recognition Tool" a few days, I think you could improve it :
> by adding the dimensional constraints in front of each of the pre-trained models,
> by adding a true tool to divide the training data correctly (in order to have an equivalent number of images for each of the labels)
> at least, allow the tool to use black & white images (I wanted to test it on the MNIST, but the tool tells me that it necessarily needs RGB images) ?
Question : do you in the future allow the user to choose between CPU or GPU usage ?
In any case, thank you again for this new tool, it is certainly perfectible, but very simple to use, and I sincerely think that it will allow a greater number of people to understand the many use cases made possible thanks to image recognition.
Thank you again
Kévin VANCAPPEL (France ;-))
Thank you again.
Kévin VANCAPPEL
When working with R code and errors occur, the application needs to show which line the error happened on.
Designer should support statistical testing tools that ignore data distribution and support Statistical Learning methods.
Alteryx already supports resampling for predictive modeling with Cross-Validation.
Resampling tools for bootstrap and permutation tests (supporting with or without replacement) should be tools for analysts and data scientists alike that assess random variability in a statistic without needing to worry about the restrictions of the data's distribution, as is the case with many parametric tests, most commonly supported by the t-test Tool in Alteryx. With modern computing power the need for hundred-year-old statistical sampling testing is fading: the power to sample a data set thousands of times to compare results to random chance is much easier today.
The tool's results could include, like R, outputs of not only the results histogram but the associated Q-Q plot that visualizes the distribution of the data for the analyst. This would duplicate the Distribution Analysis tool somewhat, but the Q-Q plot is, to me, a major missing element in the simplest visualization of data. This tool could be very valuable in terms of feeding the A/B Test tools.
I would like to suggest to add a widget which encapsulate an R script able to perform outlier detection, something similar like netflix did:
Thank you.
Regards,
Cristian
A lot of popular machine learning systems use a computer's GPU to speed up some of the math to a huge degree. The header on this article on Medium shows a 15x difference from a high-end CPU vs a high-end GPU. It could also create an improvement in the spatial tools. Perhaps Alteryx should add this functionality in order to speed up these tools, which I can imagine are currently some of the slowest.
More and more applications in R are written with tidyverse code using tidy data principles. According to rdocumentation.org, tidyverse packages are some of the most downloaded. Adding this package to the default offering will make it easier to transfer existing R code to Alteryx!
I would like to share some feedback regarding the Principal Component tool.
I've selected the option "Scale each field to have unit variance" and 1 of the 4 PCA tools was displaying errors. However, the error message is not very intuitive and I couldn't use it to debug my workflow. The problem was that for my type of data, scaling could not be applied since it had a lot of 0 values.
Couldn't find anything related to this, so hope my feedback helps others.
Thanks!
Hi there,
Similar to @aselameab1 - I was having trouble with using the Linear regression tool because it was giving error messages that were not explanatory or self descriptive.
@chadanaber identified the issue - that a specific field only had one unique value which was causing the regression tool to fail - however the error message provided gives no useful or helpful indication that this is the issue. You can see that the error message below is pretty tough to understand.
Could we add an item to the development backlog to add defensive checks to the predictive analytics tools to check for conditions that will cause them to fail, and rework the error messaging?
I've attached the workflow with the sample data that replicates this issue
Many thanks
Sean
It is important to be able to test for heteroscedasticity, so a tool for this test would be much appreciated.
In addition, I strongly believe the ability to calculate robust standard errors should be included as an option in existing regression tools, where applicable. This is a standard feature in most statistical analysis software packages.
Many thanks!
I think the Nearest Neighbor Algorithm is one of the least used, and most powerful algorithms I know of. It allows me to connect data points with other data points that are similar. When something is unpredictable, or I simply don't have enough data, this allows me to compare one data point with its nearest neighbors.
So, last night I was at school, taking a graduate level Econ course. We were discussing various distance algorithms for a nearest neighbor algorithm. Our prof discussed one called the Mahalanobis distance. It uses some fancy matrix algebra. Essentially it allows it it to filter out the noise, and only match on distance algorithms that are truly significant. It takes into account the correlation that may exists within variables, and reduces those variables down to only one.
I use Nearest Neighbor when other things aren't working for me. When my data sets are weak, sparse, or otherwise not predictable. Sometimes I don't know that particular variables are correlated. This is a powerful algorithm that could be added into the Nearest Neighbor, to allow for matches that might not otherwise be found. And allow matches on only the variables that really matter.
Currently the R predictive tools are single thread, which means to utilise multi-threading we need to download separately a third party R package such as Microsoft R Client.
Given this is a better option, should this not be used as the default package upon installation?
Idea:
A funcionality added to the Impute values tool for multiple imputation and maximum likelihood imputation of fields with missing at random will be very useful.
Rationale:
Missing data form a problem and advanced techniques are complicated. One great idea in statistics is multiple imputation,
filling the gaps in the data not with average, median, mode or user defined static values but instead with plausible values considering other fields.
SAS has PROC MI tool, here is a page detailing the usage with examples: http://www.ats.ucla.edu/stat/sas/seminars/missing_data/mi_new_1.htm
Also there is PROC CALIS for maximum likelihood here...
Same useful tool exists in spss as well http://www.appliedmissingdata.com/spss-multiple-imputation.pdf
Best
The capability to input/output R Datasets via the input/output tools, together with all the other data formats as well (like csv, Excel, SAS, SPSS, etc).
Random forest doesn't go well with missing values and create gibberish error results for Alteryx users.
Here are two quick options better to add the new version;
1) na.omit, which omits the cases form your data when tere a re missing values... you loose some observations though...
na.action=na.omit
2) na.roughfixreplaces missing values with mean for continuous and mode for categorical variables
na.action=na.roughfix
Best
This request is largely based on the implementation found on AzureML; (take their free trial and check out the Deep Convolutional and Pooling NN example from their gallery). This allows you to specify custom convolutional and pooling layers in a deep neural network. This is an extremely powerful machine learning technique that could be tricky to implement, but could perhaps be (for example) a great initial macro wrapped around something in Python, where currently these are more easily implemented than in R.
I made a search on LDA - Linear Discriminant Analysis on Alteryx Help and it returned "0" Results.
Idea: LDA - Linear Discriminant Analysis tool
to be added on the predictive tool box.
Rationale: We have PCA and MDS as tools which help a lot on "unsupervised" dimentionality reduction in predictive modelling.
Bu if we need a method that takes target values into considerations we need a "supervised" tool instead...
"LDA is also closely related to principal component analysis (PCA) and factor analysis in that they both look for linear combinations of variables which best explain the data.[4] LDA explicitly attempts to model the difference between the classes of data. PCA on the other hand does not take into account any difference in class, and factor analysis builds the feature combinations based on differences rather than similarities. Discriminant analysis is also different from factor analysis in that it is not an interdependence technique: a distinction between independent variables and dependent variables (also called criterion variables) must be made."
The existing decision tree node is automatic
but business users need to mingle with the decision tree, prune the tree and grow certain parts of the tree using their domain expertise...
SPSS has a nice facility as you can see below... Desperately looking forward for an Alteryx version...
Hello,
the randomforest package implementation in Alteryx works fine for smaller datasets but becomes very slow for large datasets with many features.
There is the opensource Ranger package https://arxiv.org/pdf/1508.04409.pdf that could help on this.
Along with XGBoost/LightGMB/Catboost it would be an extremely welcome addition to the predictive package!
It feels that lately Alteryx has been focusing on integration rather than adding more machine learning tools, which sadly are still not on par with many competing products...
Personally I miss having XGboost and multi-core random forest libraries like Ranger (along with a more robust implementation of C5.0).
What about you guys? Which R/Python libraries are you missing in Alteryx?
Improve Help Documentation or in-tool options for handling null values in statistical tools like Weighted Average or Linear Regression. For instance, checkbox to remove null value records, or at least warn users.
In the processing of learning to perform linear regression in RStudio and Alteryx, I came across differing outputs depending on how null values were addressed. Take the Weighted Average tool for example.
In R, the weighted.mean function treats null values in the variable of interest as if they were not there. If the user does not specify that null values exist, the result is NA. If any null values exist in the weight field, the result is NA.
Since I am more familiar with Alteryx, I originally did the data preparation—including calculating the weighted means—in Alteryx. When comparing these weighted means with those generated in R, I found that Alteryx treats the null values as zeros (i.e. includes them in the calculation). The user would have to know this is incorrect and first filter out the null values. See screenshot examples.
This is also the case within the Linear Regression tool. If null values are not omitted prior to regression, the results are wildly different. Perhaps this is known by more experienced users/statisticians, but this incorrect usage would have gone on unbeknownst to be had I not cross-checked with RStudio.
Weighted Average in Alteryx
Weighted Mean in R