Free Trial

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Featured Ideas

Hello,

A lot of time, when you have a dataset, you want to know if there is a group of fields that works together. That can help to normalize (like de-joining) your data model for dataviz, performance issue or simplify your analysis.

Exemple

 

order_id item_id label model_id length color amount

11A1015Blue101
21A1015Blue101
32B1015Blue101
42B1015Blue101
52B1015Blue101
63C2025Red101
73C2025Red101
83C2025Red101
94D2025Red101
104D2025Red101
114D2025Red101

Here, we could split the table in three :
-order

 

order_id item_id model_id amount

1110101,2
2110103
3210104,8
4210106,6
5210108,4
6320110,2
7320112
8320113,8
9420115,6
10420117,4
11420119,2

-model

 

model_id length color

1015Blue
2025Red

-item

 

item_id label

1A
2B
3C
4D

The tool would take :
-a dataframe in entry
-configuration : ability to select fields.
-output : a table with the recap of groups

<style> </style>

field group field remaining fields

1item_idFalse
1labelFalse
2model_idFalse
2colorFalse
3order_idTrue
3link to group 1True
3link to group 2True
3amountTrue

Very important : the non-selected fields (like here, amount), are in the result but all in the "remaining" group.

Algo steps:
1/pre-groups : count distinct of each fields. goal : optimization of algo, to avoid to calculate all pairs
fields that has the same count distinct than the number of rows are automatically excluded and sent to the remaining group
fields that have have the same count distinct are set in the same pre-group

2/ for each group, for each pair of fields,
let's do a distinct of value of the pair
like here

 

item_id label

1A
2B
3C
4D

if in this table, the count distinct of each field is equal to the number of rows, it's a "pair-group"

here, for the model, you will have
-model_id,length
-model_id,color
-length,color

3/Since a field can only belong to one group, it means model_id,length,color which would first (or second) group, then item_id and label

If a field does not belong to a group, he goes to "remaining group" at the end

in the remaining group, you can add a link to the other group since you don't know which field is the key.

<style> </style>

field group field remaining fields

1item_idFalse
1labelFalse
2model_idFalse
2lengthFalse
2colorFalse
3order_idTrue
3link to group 1True
3link to group 2True
3amountTrue

Best regards,

Simon

PS : I have in mind an evolution with links between non-remaining table (like here, the model could be linked to the item as an option)

Hello,

This is a feature I haven't seen in any data prepation/etl. The core feature is to detect the unique key in a dataframe. More than often, you have to deal with a dataset without knowing what's make a row unique. This can lead to misinterpret the data, cartesian product at join and other funny stuff.

How do I imagine that ?

a specific tool in the Data Investigation category

Entry; one dataframe, ability to select fields or check all, ability to specify a max number of field for combination (empty or 0=no max).
Algo : it tests the count distinct every combination of field versus the count of rows

Result : one row by field combination that works. If no result : "no field combination is unique. check for duplicate or need for aggregation upstream".

ex :

 

order_id line_id amount customer site

11100AU_250
1212AU_250
1345AU_250
2175AU_250
2212AU_250
3115BU_250
4145BU_251

The user will select every field but excluding Amount (he knows that Amount would have no sense in key)

The algo will test the following key
-each separate field
-each combination of two fields
-each combination of three fields
-each combination of four fields

to match the number of row (7)
And gives something like that

 

choice number of fields field combination

very good2order_id,line_id
average3order_id,line_id, customer
average3order_id,line_id, site
bad4order_id,line_id, site, customer
….

Best regards,

Simon

Hello,

Unless you're lucky, your input dataset can have fields with the wrong types. That can lead to several issues such as :
-performance (a string is waaaaaaaay slower than let's say a boolean)
-compliance with master data management
-functional understanding (e.g : if i have a field called "modified" typed as string, I don't know if it contains the modification date, an information about the modification, etc... while if it's is typed as date, I already know it's a date)
-ability to do some type-specific operations (you can't multiply a string or extract a week from a string)


right now, the existing tools have been focused on strings but I think we can do better.


Here a proposition :

entry : a dataframe
configuration :
-selection of fields
or
-selection of field types
-ability to do it on a sample (optional)


Algo :

AlteryxByteboolonly 2 values. 0 and 1 to be done
AlteryxInt16boolonly 2 values. 0 and 1 to be done
AlteryxInt16Bytemin=>0, max <=255to be done
AlteryxInt32boolonly 2 values. 0 and 1 to be done
AlteryxInt32Bytemin>=0, max <=255to be done
AlteryxInt32Int16min>=-32,768, max <=32,767to be done
AlteryxInt64boolonly 2 values. 0 and 1 to be done
AlteryxInt64Bytemin>=0, max <=255to be done
AlteryxInt64Int16min>=-32,768, max <=32,767to be done
AlteryxInt64Int32min>=-–2,147,483,648, max <=2,147,483,647to be done
AlteryxFixed Decimalboolonly 2 values. 0 and 1 to be done
AlteryxFixed DecimalByteNo decimal part, min>=0, max <=255to be done
AlteryxFixed DecimalInt16No decimal part, min>=-32,768, max <=32,767to be done
AlteryxFixed DecimalInt32No decimal part, min>=-–2,147,483,648, max <=2,147,483,647to be done
AlteryxFixed DecimalInt36No decimal part, min>=-––9,223,372,036,854,775,808, max <=9,223,372,036,854,775,807to be done
AlteryxFloatboolonly 2 values. 0 and 1 or 0,-1to be done
AlteryxFloatByteNo decimal part, min>=0, max <=255to be done
AlteryxFloatInt16No decimal part, min>=-32,768, max <=32,767to be done
AlteryxFloatInt32No decimal part, min>=-–2,147,483,648, max <=2,147,483,647to be done
AlteryxFloatInt36No decimal part, min>=-––9,223,372,036,854,775,808, max <=9,223,372,036,854,775,807to be done
AlteryxFloatFixed Decimalto be doneto be done
AlteryxDoubleboolonly 2 values. 0 and 1 or 0,-1to be done
AlteryxDoubleByteNo decimal part, min>=0, max <=255to be done
AlteryxDoubleInt16No decimal part, min>=-32,768, max <=32,767to be done
AlteryxDoubleInt32No decimal part, min>=-–2,147,483,648, max <=2,147,483,647to be done
AlteryxDoubleInt36No decimal part, min>=-––9,223,372,036,854,775,808, max <=9,223,372,036,854,775,807to be done
AlteryxDoubleFixed Decimalto be doneto be done
AlteryxDoubleFloatwhen no need for doube precisionto be done
AlteryxDateTimeDateno hours, minutes, secondsto be done
AlteryxStringboolonly 2 values. 0 and 1 or 0,-1 or True/False or TRUE/FALSE or equivalent in some languages such as VRAI/FAUX, Vrai/Fauxto be done
AlteryxStringByteNo decimal part, min>=0, max <=255to be done
AlteryxStringInt16No decimal part, min>=-32,768, max <=32,767to be done
AlteryxStringInt32No decimal part, min>=-–2,147,483,648, max <=2,147,483,647to be done
AlteryxStringInt36No decimal part, min>=-––9,223,372,036,854,775,808, max <=9,223,372,036,854,775,807to be done
AlteryxStringFixed Decimalto be doneto be done
AlteryxStringFloatwhen no need for doube precisionto be done
AlteryxStringDoublewhen need for double precisionto be done
AlteryxStringDatetest on several date formatsto be done
AlteryxStringTimetest on several time formatsto be done
AlteryxStringDateTimetest on several datetime formatsto be done
AlteryxWStringboolonly 2 values. 0 and 1 or 0,-1 or True/False or TRUE/FALSE or equivalent in some languages such as VRAI/FAUX, Vrai/Fauxto be done
AlteryxWStringByteNo decimal part, min>=0, max <=255to be done
AlteryxWStringInt16No decimal part, min>=-32,768, max <=32,767to be done
AlteryxWStringInt32No decimal part, min>=-–2,147,483,648, max <=2,147,483,647to be done
AlteryxWStringInt36No decimal part, min>=-––9,223,372,036,854,775,808, max <=9,223,372,036,854,775,807to be done
AlteryxWStringFixed Decimalto be doneto be done
AlteryxWStringFloatwhen no need for doube precisionto be done
AlteryxWStringDoublewhen need for double precisionto be done
AlteryxWStringStringLatin-1 character onlyto be done
AlteryxWStringDatetest on several date formatsto be done
AlteryxWStringTimetest on several time formatsto be done
AlteryxWStringDateTimetest on several datetime formatsto be done
AlteryxV_Stringboolonly 2 values. 0 and 1 or 0,-1 or True/False or TRUE/FALSE or equivalent in some languages such as VRAI/FAUX, Vrai/Fauxto be done
AlteryxV_StringByteNo decimal part, min>=0, max <=255to be done
AlteryxV_StringInt16No decimal part, min>=-32,768, max <=32,767to be done
AlteryxV_StringInt32No decimal part, min>=-–2,147,483,648, max <=2,147,483,647to be done
AlteryxV_StringInt36No decimal part, min>=-––9,223,372,036,854,775,808, max <=9,223,372,036,854,775,807to be done
AlteryxV_StringFixed Decimalto be doneto be done
AlteryxV_StringFloatwhen no need for doube precisionto be done
AlteryxV_StringDoublewhen need for double precisionto be done
AlteryxV_StringStringSame lengthto be done
AlteryxV_StringDatetest on several date formatsto be done
AlteryxV_StringTimetest on several time formatsto be done
AlteryxV_StringDateTimetest on several datetime formatsto be done
AlteryxV_WStringboolonly 2 values. 0 and 1 or 0,-1 or True/False or TRUE/FALSE or equivalent in some languages such as VRAI/FAUX, Vrai/Fauxto be done
AlteryxV_WStringByteNo decimal part, min>=0, max <=255to be done
AlteryxV_WStringInt16No decimal part, min>=-32,768, max <=32,767to be done
AlteryxV_WStringInt32No decimal part, min>=-–2,147,483,648, max <=2,147,483,647to be done
AlteryxV_WStringInt36No decimal part, min>=-––9,223,372,036,854,775,808, max <=9,223,372,036,854,775,807to be done
AlteryxV_WStringFixed Decimalto be doneto be done
AlteryxV_WStringFloatwhen no need for doube precisionto be done
AlteryxV_WStringDoublewhen need for double precisionto be done
AlteryxV_WStringStringSame length,latin- character onlyto be done
AlteryxV_WStringWStringSame lengthto be done
AlteryxV_WStringV_Stringlatin- character onlyto be done
AlteryxV_WStringDatetest on several date formatsto be done
AlteryxV_WStringTimetest on several time formatsto be done
AlteryxV_WStringDateTimetest on several datetime formatsto be done

 

The output would be something like that

FieldInput typePropositionConversion
totofloatintformula (with example)/native tool/datetime conversion tool…



Best regards,

Simon



Hi all!

 

Based on the title, here's some background information: SHAPLEY Values

 

Currently, one way of doing so is to utilize the Python tool to write out the script and install the package. However, this will require running Alteryx as an administrator in order to successfully load, test, and run the script. The problem is, a substantial number of companies do not grant such privileges to their Alteryx teams to run as administrator fully as it will always require admin credentials to log in to even open Alteryx after closing it.

 

I am aware that there is a macro covering SHAP but I've recently tested it and it did not work as intended, plus it covers non-categorical values as determinants only, thereby requiring a conversion of categorical variables into numeric categories or binary categories. 

 

It will be nice to have a built in Alteryx ML tool that does this analysis and produces a graph akin to a heat map that showcases the values like below:

caltang_0-1680442322684.png

 

By doing so, it adds more value to the ML suite and actually helps convince companies to get it.

 

Otherwise teams will just use Python and be done with it, leaving only Alteryx as the clean-up ETL tool. It leaves much to be desired, and can leave some teams hanging.

 

I hope for some consideration on this - thank you.

 

Please add a data validator workflow.

 

Suggested features will be the following:

1.  Add validation name and set the field/s of your data you want to validate. (it can have more than one validation name on one workflow)

2. On the selected validation(name). Add features that will check/validate the information below:

   A. Verify data type
   B. Contains Null
   C. Max and Min string length
   D. Allowed values only, else it will give you an error
   E. Regex expected to match and not allowed to match.

3. It can have two(2) outputs. One is True(which is match) and False(which is fail over/error).

Many software & hardware companies take a very quantitative approach to driving their product innovation so that they can show an improvement over time on a standard baseline of how the product is used today; and then compare this to the way it can solve the problem in the new version and measure the improvement.

 

For example:

- Database vendors have been doing this for years using TPC benchmarks (http://www.tpc.org/) where a FIXED set of tasks is agreed as a benchmark and the database vendors then they iterate year over year to improve performance based on these benchmarks

- Graphics card companies or GPU companies have used benchmarks for years (e.g. TimeSpy; Cinebench etc).

 

How could this translate for Alteryx?

- Every year at Inspire - we hear the stats that say that 90-95% of the time taken is data preparation

- We also know that the reason for buying Alteryx is to reduce the time & skill level required to achieve these outcomes - again, as reenforced by the message that we're driving towards self-service analytics & Citizen-data-analytics.

 

The dream:

Wouldn't it be great if Alteryx could say: "In the 2019.3 release - we have taken 10% off the benchmark of common tasks as measured by time taken to complete" - and show a 25% reduction year over year in the time to complete this battery of data preparation tasks?

 

One proposed method:

  • Take an agreed benchmark set of tasks / data / problems / outcomes, based on a standard data set - these should include all of the common data preparation problems that people face like date normalization; joining; filtering; table sync (incremental sync as well as dump-and-load); etc.
  • Measure the time it takes users to complete these data-prep/ data movement/ data cleanup tasks on the benchmark data & problem set using the latest innovations and tools
  • This time then becomes the measure - if it takes an average user 20 mins to complete these data prep tasks today; and in the 2019.3 release it takes 18 mins, then we've taken 10% off the cost of the largest piece of the data analytics pipeline.

 

What would this give Alteryx?

This could be very simple to administer; and if done well it could give Alteryx:

- A clear and unambiguous marketing message that they are super-focussed on solving for the 90-95% of your time that is NOT being spent on analytics, but rather on data prep

- It would also provide focus to drive the platform in the direction of the biggest pain points - all the teams across the platform can then rally around a really deep focus on the user and accelerating their "time from raw data to analytics".   

- A competitive differentiation - invite your competitors to take part too just like TPC.org or any of the other benchmarks

 

What this is / is NOT:

  • This is not a run-time measure - i.e. this is not measuring transactions or rows per second
  • This should be focussed on "Given this problem; and raw data - what is the time it takes you, and the number of clicks and mouse moves etc - to get to the point where you can take raw data, and get it prepped and clean enough to do the analysis".
  • This should NOT be a test of "Once you've got clean data - how quickly can you do machine learning; or decision trees; or predictive analytics" - as we have said above, that is not the big problem - the big problem is the 90-95% of the time which is spent on data prep / transport / and cleanup.

 

Loads of ways that this could be administered - starting point is to agree to drive this quantitatively on a fixed benchmark of tasks and data

 

@LDuane ; @SteveA ; @jpoz ; @AshleyK ; @AJacobson ; @DerekK ; @Cimmel ; @TuvyL ; @KatieH ;  @TomSt ; @AdamR_AYX ; @apolly 

 

 

 

 

Similar to the Select tool's Unknown Field Checkbox, I figured it would be useful for the Data Cleansing tool to have this functionality as well in order to avoid a scenario where after a cross-tab you have a new numeric field, one of which has a Null value, so you can't total up multiple fields because the Null value will prevent the addition from happening. If the Unknown Field box were checked off in the Data Cleansing tool then this problem would be avoided.

I would like to see more files types supported to be able to be dragged from a folder onto a workflow. More precisely a .txt and a .dat file. This will greatly help my team and I do be able to analyze new and unknown data files that we receive on a daily basis.  

 

 

Thank you. 

 

The sum function is probably the one I use most in the summarize tool. It is a silly thing, but it would be nice for "Sum" to be in the single-click list, rather than in the "Numeric" category...

 

Move sum functionMove sum function

This idea arose recently when working specifically with the Association Analysis tool, but I have a feeling that other predictive tools could benefit as well.  I was trying to run an association analysis for a large number of variables, but when I was investigating the output using the new interactive tools, I was presented with something similar to this:

 

CorrelationPlot.PNG

 

While the correlation plot draws your high to high associations, the user is unable to read the field names, and the tooltip only provides the correlation value rather than the fields with the value.  As such, I shifted my attention to the report output, which looked like this:

 

CorrelationTable.PNG

 

While I could now read everything, it made pulling out the insights much more difficult.  Wanting the best of both worlds, I decided to extract the correlation table from the R output and drop it into Tableau for a filterable, interactive version of the correlation matrix.  This turned out to be much easier said than done.  Because the R output comes in report form, I tried to use the report extract macros mentioned in this thread to pull out the actual values.  This was an issue due to the report formatting, so instead I cracked open the macro to extract the data directly from the R output.  To make a long story shorter, this ended up being problematic due to report formats, batch macro pathing, and an unidentifiable bug.  

 

In the end, it would be great if there was a “Data” output for reports from certain predictive tools that would benefit from further analysis. While the reports and interactive outputs are great for ingesting small model outputs, at times there is a need to extract the data itself for further analysis/visualization.  This is one example, as is the model coefficients from regression analyses that I have used in the past.  I know Dr. Dan created a model coefficients macro for the case of regression, but I have to imagine that there are other cases where the data is desired along with the report/interactive output.

 

There is a need when visualizing in-Database workflows to be able to visualize sorted data. This sorting could be done 1 of 2 ways: In a browse tool, or as a stand-alone Sort tool. Either would address the need. Without such a tool being present, the only way to sort the data is to "Data Stream Out" and then visualize the data in Alteryx. However, this process violates the premise of the usefulness of the in-DB toolkit, which is to keep your data in-DB and process using the DB engine. Streaming out big data in order to add a sort is not efficient.

 

Granted, the in-DB processing doesn't care whether data is sorted or not. However, when attempting to find extreme values after an aggregation, or when trying to identify something as simple as whether null values are present in a field, then a sort becomes extremely useful, and a necessary tool for human consumption of data (regardless of the database's processing needs).

 

Thanks very much for hearing my idea!

Dear GUI Gurus,

 

A minor, but time saving GUI enhancement would be appreciated.  When adding a tool to the canvas, the current behavior is to make visible the tool anchor that was last used on prior tools.  That being said, when I look at the results window, I might be adding a "vanilla" configuration tool to the canvas and stare at a BLANK results window.  When users are adding tools to the canvas, I suggest that the best practice is to VIEW the incoming data before configuring the tool.

 

I ALWAYS set the results to view the INCOMING DATA ANCHOR.

 

This minor change would be welcome to me.

 

Cheers,

 

Mark

Right now - if a tool generates an error - there is nothing productive that you can do with the error rows, these are just sent to the error log and depending on your settings the entire canvas will fail.

 

Could we change this in the Designer to work more like SSIS - where almost every tool has an error output, so that you can send the good rows one way, and the error rows the other way, and then continue processing?    The error rows can be sent to an error table or workflow or data-quality service; and the good rows can be sent onwards.   Because you have access to the error rows, you can also do run stats of "successful rows vs. unsuccessful"

 

This would make a big difference in the velocity of developing a canvas or prepping data.

 

Can take some screenshots if that helps?

Browse tool is really a powerful tool. We can see all information regarding datasets very rapidly.

Unfortunately, we only can export information (graphs, tables) manually through PNG files...

 

One major interest of Alteryx in Big Company is to perform DATA Quality reviews.

 

If we could export Browse tool informations (graphs, tables) automatically in pdf file or other solutions, we could save a lot of time in Data Quality tasks.

 

The only solution is to use DataViz tool or set up specific render in Alteryx (very time-consumming)

 

Main benefit would be the ability to share insights of DATA Quality with other business units.

 

Best Regards

Unsupervised learning method to detect topics in a text document.

 

Helpful for users interested in text mining.

We don't have a seperate ANOVA tool in Alteryx, do you think of any reason?

 

It's not raw data or row blended data but insights gathered that's important:

 

Linear Regression Tool has a report for Type II ANOVA based on the model table we provide.

But both type II and other types are not available as standalone statistics tools...

 

Untitled.png

 

Here is the list of different types of Anova that may be useful;

 

ANOVA models Definitions

t-testsComparison of means between two groups; if independent groups, then independent samples t-test. If not independent, then paired samples t-test. If comparing one group against a fixed value, then a one-sample t-test.
One-way ANOVAComparison of means of three or more independent groups.
One-way repeated measures ANOVAComparison of means of three or more within-subject variables.
Factorial ANOVAComparison of cell means for two or more between-subject IVs.
Mixed ANOVA
(SPANOVA)
Comparison of cells means for one or more between-subjects IV and one or more within-subjects IV.
ANCOVAAny ANOVA model with a covariate.
MANOVAAny ANOVA model with multiple DVs. Provides omnibus F and separate Fs.

 

Looking forward for the addition of ANOVA tools to the data investigation tool box...

We have the brows icon witch connect at on output at a time. But to be more efficient I would like a browser tool witch connect to 2 or 3 outputs at one icon. Connect to True false at the filter or L J R at the join record.

 

Python pandas dataframes and data types (numpy arrays, lists, dictionaries, etc.) are much more robust in general than their counterparts in R, and they play together much easier as well. Moreover, there are only a handful of packages that do everything a data scientist would need, including graphing, such as SciKit Learn, Pandas, Numpy, and Seaborn. After utliizing R, Python, and Alteryx, I'm still a big proponent of integrating with the Python language much like Alteryx has integrated with R. At the very least, I propose to create the ability to create custom code such as a Python tool. 

Hi all,

 

One if the most common data-investigation tasks we have to do is comparing 2 data-sets.   This may be making sure the columns are the same, field-name match, or even looking at row data.   I think that this would be a tremendous addition to the core toolset.   I've made a fairly good start on it, and am more than happy if you want to take this and extend or add to it (i give this freely with no claim on the work).

 

Very very happy to work with the team to build this out if it's useful

 

Cheers

Sean

One of the tools that I use the most is the SELECT tool because I normally get large data sets with fields that I won't be using for a specific analysis or with fields that need re-naming. In the same way, sometimes Alteryx will mark a field in a different type than the one I need (e.g. date field as string). That's when the SELECT comes in handy.

 

However, often times when dealing with multiple sources and having many SELECT tools on your canvas can make the workflow look a little "crowded". Not to mention adding extra tools that will need later explanation when presenting/sharing your canvas with others. That is why my suggestion is to give the CONNECTION tool "more power" by offering some of the functionality found in the SELECT tool.

 
Select Tool 2.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

For instance, if one of the most used features of the SELECT tool is to choose the fields that will move through the workflow, then may be we can make that feature available in the CONNECTION tool. Similarly, if one of the most used features (by Alteryx users) is to re-name fields or change the field type, then may be we can make that available in the CONNECTION tool as well.

 

Select Tool.png

 

At the end, developers can benefit from speeding up workflow development processes and end-users will benefit by having cleaner workflows presented to them, which always help to get the message across.

 

 

What do you guys think? Any of you feel the same? Leave your comments below.

Top Liked Authors