Be sure to review our Idea Submission Guidelines for more information!
Submission GuidelinesHello,
After used the new "Image Recognition Tool" a few days, I think you could improve it :
> by adding the dimensional constraints in front of each of the pre-trained models,
> by adding a true tool to divide the training data correctly (in order to have an equivalent number of images for each of the labels)
> at least, allow the tool to use black & white images (I wanted to test it on the MNIST, but the tool tells me that it necessarily needs RGB images) ?
Question : do you in the future allow the user to choose between CPU or GPU usage ?
In any case, thank you again for this new tool, it is certainly perfectible, but very simple to use, and I sincerely think that it will allow a greater number of people to understand the many use cases made possible thanks to image recognition.
Thank you again
Kévin VANCAPPEL (France ;-))
Thank you again.
Kévin VANCAPPEL
The sum function is probably the one I use most in the summarize tool. It is a silly thing, but it would be nice for "Sum" to be in the single-click list, rather than in the "Numeric" category...
There is a need when visualizing in-Database workflows to be able to visualize sorted data. This sorting could be done 1 of 2 ways: In a browse tool, or as a stand-alone Sort tool. Either would address the need. Without such a tool being present, the only way to sort the data is to "Data Stream Out" and then visualize the data in Alteryx. However, this process violates the premise of the usefulness of the in-DB toolkit, which is to keep your data in-DB and process using the DB engine. Streaming out big data in order to add a sort is not efficient.
Granted, the in-DB processing doesn't care whether data is sorted or not. However, when attempting to find extreme values after an aggregation, or when trying to identify something as simple as whether null values are present in a field, then a sort becomes extremely useful, and a necessary tool for human consumption of data (regardless of the database's processing needs).
Thanks very much for hearing my idea!
Similar to the Select tool's Unknown Field Checkbox, I figured it would be useful for the Data Cleansing tool to have this functionality as well in order to avoid a scenario where after a cross-tab you have a new numeric field, one of which has a Null value, so you can't total up multiple fields because the Null value will prevent the addition from happening. If the Unknown Field box were checked off in the Data Cleansing tool then this problem would be avoided.
This idea arose recently when working specifically with the Association Analysis tool, but I have a feeling that other predictive tools could benefit as well. I was trying to run an association analysis for a large number of variables, but when I was investigating the output using the new interactive tools, I was presented with something similar to this:
While the correlation plot draws your high to high associations, the user is unable to read the field names, and the tooltip only provides the correlation value rather than the fields with the value. As such, I shifted my attention to the report output, which looked like this:
While I could now read everything, it made pulling out the insights much more difficult. Wanting the best of both worlds, I decided to extract the correlation table from the R output and drop it into Tableau for a filterable, interactive version of the correlation matrix. This turned out to be much easier said than done. Because the R output comes in report form, I tried to use the report extract macros mentioned in this thread to pull out the actual values. This was an issue due to the report formatting, so instead I cracked open the macro to extract the data directly from the R output. To make a long story shorter, this ended up being problematic due to report formats, batch macro pathing, and an unidentifiable bug.
In the end, it would be great if there was a “Data” output for reports from certain predictive tools that would benefit from further analysis. While the reports and interactive outputs are great for ingesting small model outputs, at times there is a need to extract the data itself for further analysis/visualization. This is one example, as is the model coefficients from regression analyses that I have used in the past. I know Dr. Dan created a model coefficients macro for the case of regression, but I have to imagine that there are other cases where the data is desired along with the report/interactive output.
Many software & hardware companies take a very quantitative approach to driving their product innovation so that they can show an improvement over time on a standard baseline of how the product is used today; and then compare this to the way it can solve the problem in the new version and measure the improvement.
For example:
- Database vendors have been doing this for years using TPC benchmarks (http://www.tpc.org/) where a FIXED set of tasks is agreed as a benchmark and the database vendors then they iterate year over year to improve performance based on these benchmarks
- Graphics card companies or GPU companies have used benchmarks for years (e.g. TimeSpy; Cinebench etc).
How could this translate for Alteryx?
- Every year at Inspire - we hear the stats that say that 90-95% of the time taken is data preparation
- We also know that the reason for buying Alteryx is to reduce the time & skill level required to achieve these outcomes - again, as reenforced by the message that we're driving towards self-service analytics & Citizen-data-analytics.
The dream:
Wouldn't it be great if Alteryx could say: "In the 2019.3 release - we have taken 10% off the benchmark of common tasks as measured by time taken to complete" - and show a 25% reduction year over year in the time to complete this battery of data preparation tasks?
One proposed method:
What would this give Alteryx?
This could be very simple to administer; and if done well it could give Alteryx:
- A clear and unambiguous marketing message that they are super-focussed on solving for the 90-95% of your time that is NOT being spent on analytics, but rather on data prep
- It would also provide focus to drive the platform in the direction of the biggest pain points - all the teams across the platform can then rally around a really deep focus on the user and accelerating their "time from raw data to analytics".
- A competitive differentiation - invite your competitors to take part too just like TPC.org or any of the other benchmarks
What this is / is NOT:
Loads of ways that this could be administered - starting point is to agree to drive this quantitatively on a fixed benchmark of tasks and data
@LDuane ; @SteveA ; @jpoz ; @AshleyK ; @AJacobson ; @DerekK ; @Cimmel ; @TuvyL ; @KatieH ; @TomSt ; @AdamR_AYX ; @apolly
I would like to see more files types supported to be able to be dragged from a folder onto a workflow. More precisely a .txt and a .dat file. This will greatly help my team and I do be able to analyze new and unknown data files that we receive on a daily basis.
Thank you.
Hi all!
Based on the title, here's some background information: SHAPLEY Values
Currently, one way of doing so is to utilize the Python tool to write out the script and install the package. However, this will require running Alteryx as an administrator in order to successfully load, test, and run the script. The problem is, a substantial number of companies do not grant such privileges to their Alteryx teams to run as administrator fully as it will always require admin credentials to log in to even open Alteryx after closing it.
I am aware that there is a macro covering SHAP but I've recently tested it and it did not work as intended, plus it covers non-categorical values as determinants only, thereby requiring a conversion of categorical variables into numeric categories or binary categories.
It will be nice to have a built in Alteryx ML tool that does this analysis and produces a graph akin to a heat map that showcases the values like below:
By doing so, it adds more value to the ML suite and actually helps convince companies to get it.
Otherwise teams will just use Python and be done with it, leaving only Alteryx as the clean-up ETL tool. It leaves much to be desired, and can leave some teams hanging.
I hope for some consideration on this - thank you.
Python pandas dataframes and data types (numpy arrays, lists, dictionaries, etc.) are much more robust in general than their counterparts in R, and they play together much easier as well. Moreover, there are only a handful of packages that do everything a data scientist would need, including graphing, such as SciKit Learn, Pandas, Numpy, and Seaborn. After utliizing R, Python, and Alteryx, I'm still a big proponent of integrating with the Python language much like Alteryx has integrated with R. At the very least, I propose to create the ability to create custom code such as a Python tool.
Right now - if a tool generates an error - there is nothing productive that you can do with the error rows, these are just sent to the error log and depending on your settings the entire canvas will fail.
Could we change this in the Designer to work more like SSIS - where almost every tool has an error output, so that you can send the good rows one way, and the error rows the other way, and then continue processing? The error rows can be sent to an error table or workflow or data-quality service; and the good rows can be sent onwards. Because you have access to the error rows, you can also do run stats of "successful rows vs. unsuccessful"
This would make a big difference in the velocity of developing a canvas or prepping data.
Can take some screenshots if that helps?
Dear GUI Gurus,
A minor, but time saving GUI enhancement would be appreciated. When adding a tool to the canvas, the current behavior is to make visible the tool anchor that was last used on prior tools. That being said, when I look at the results window, I might be adding a "vanilla" configuration tool to the canvas and stare at a BLANK results window. When users are adding tools to the canvas, I suggest that the best practice is to VIEW the incoming data before configuring the tool.
I ALWAYS set the results to view the INCOMING DATA ANCHOR.
This minor change would be welcome to me.
Cheers,
Mark
Hi all,
One if the most common data-investigation tasks we have to do is comparing 2 data-sets. This may be making sure the columns are the same, field-name match, or even looking at row data. I think that this would be a tremendous addition to the core toolset. I've made a fairly good start on it, and am more than happy if you want to take this and extend or add to it (i give this freely with no claim on the work).
Very very happy to work with the team to build this out if it's useful
Cheers
Sean
One of the tools that I use the most is the SELECT tool because I normally get large data sets with fields that I won't be using for a specific analysis or with fields that need re-naming. In the same way, sometimes Alteryx will mark a field in a different type than the one I need (e.g. date field as string). That's when the SELECT comes in handy.
However, often times when dealing with multiple sources and having many SELECT tools on your canvas can make the workflow look a little "crowded". Not to mention adding extra tools that will need later explanation when presenting/sharing your canvas with others. That is why my suggestion is to give the CONNECTION tool "more power" by offering some of the functionality found in the SELECT tool.
For instance, if one of the most used features of the SELECT tool is to choose the fields that will move through the workflow, then may be we can make that feature available in the CONNECTION tool. Similarly, if one of the most used features (by Alteryx users) is to re-name fields or change the field type, then may be we can make that available in the CONNECTION tool as well.
At the end, developers can benefit from speeding up workflow development processes and end-users will benefit by having cleaner workflows presented to them, which always help to get the message across.
What do you guys think? Any of you feel the same? Leave your comments below.
Unsupervised learning method to detect topics in a text document.
Helpful for users interested in text mining.
Hi,
I wasted a good old chunk of time dealing with non-breaking spaces, and Alteryx could be improved by handling this automatically.
A space is a space, right? Nope, there are spaces (ASCII value decimal 32) and there are non-breaking spaces (ASCII value decimal 160). They look the same, but have slightly different behaviour in certain circumstances, like when text is auto-wrapped.
The DataCleansing tool cleans spaces, but leaves non-breaking spaces.
The Data Grid puts a warning on cells with leading or trailing spaces, but remains silent for non-breaking spaces.
I was trying to match two strings, that looked identical. I had DataCleansed my cells, and the grid was showing me nothing wrong with the data. In desperation, I copied the two data cells that I expected to match to a text editor (Textpad), and then examined the binary ASCII values of the data. One cell had a trailing non-breaking space, and that caused the failure to match.
This was hard to find. For someone less hopelessly nerdy, it would be practically impossible.
As a small change, it might be really useful for Alteryx to include non-breaking spaces in it's definition of "space", such that DataCleansing tool removes it, and the Data Grid flags up the cell as having a leading or trailing space.
You could pick up non-breaking spaces from HTML, or from Excel. I think mine came from a SQL script but I am not sure how it was there. They are out there, and they will bite.
Browse tool is really a powerful tool. We can see all information regarding datasets very rapidly.
Unfortunately, we only can export information (graphs, tables) manually through PNG files...
One major interest of Alteryx in Big Company is to perform DATA Quality reviews.
If we could export Browse tool informations (graphs, tables) automatically in pdf file or other solutions, we could save a lot of time in Data Quality tasks.
The only solution is to use DataViz tool or set up specific render in Alteryx (very time-consumming)
Main benefit would be the ability to share insights of DATA Quality with other business units.
Best Regards
We don't have a seperate ANOVA tool in Alteryx, do you think of any reason?
It's not raw data or row blended data but insights gathered that's important:
Linear Regression Tool has a report for Type II ANOVA based on the model table we provide.
But both type II and other types are not available as standalone statistics tools...
Here is the list of different types of Anova that may be useful;
ANOVA models Definitions
t-tests | Comparison of means between two groups; if independent groups, then independent samples t-test. If not independent, then paired samples t-test. If comparing one group against a fixed value, then a one-sample t-test. |
One-way ANOVA | Comparison of means of three or more independent groups. |
One-way repeated measures ANOVA | Comparison of means of three or more within-subject variables. |
Factorial ANOVA | Comparison of cell means for two or more between-subject IVs. |
Mixed ANOVA (SPANOVA) | Comparison of cells means for one or more between-subjects IV and one or more within-subjects IV. |
ANCOVA | Any ANOVA model with a covariate. |
MANOVA | Any ANOVA model with multiple DVs. Provides omnibus F and separate Fs. |
Looking forward for the addition of ANOVA tools to the data investigation tool box...
A question has been coming up from several users at my workplace about allowing a column description to display in the Visual Query Builder instead of or along with the column name.
The column names in our database are based on an older naming convention, and sometimes the names aren't that easy to understand. We do see that (if a column does have a column description in metadata) it shows when hovering over the particular column; however, the consensus is that we'd like to reverse this and have the column description displayed with the column name shown on hover.
It would be a huge increase to efficiency and workflow development if this could be implemented.
User | Likes Count |
---|---|
27 | |
13 | |
7 | |
6 | |
6 |