Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!
The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Featured Ideas

On 2019.2.5.62427, interactive results grid is only available for the embedded result window but not if you open the results in new window 'Open results in New Window' -> New Window

 

This is verified by @PaulN - https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Interactive-Results-Grid-not-available...

 

It also appears that interactive grid is also not available if you double click a yxdb file to open it and view the content.

 

Would be useful to have the interactive grid in both these areas instead of just the embedded result window.

 

 

 

 

Access to only MD5 hashes via MD5_ASCII(String) and MD5_UNICODE(String) found under string functions is limiting.  Is there a way to access other hashing algorithms, ideally via the crypto algorithms from OpenSSL or the .NET framework? 

 

  - https://msdn.microsoft.com/en-us/library/system.security.cryptography.hashalgorithm(v=vs.110).aspx
  - https://wiki.openssl.org/index.php/Command_Line_Utilities#Signing_.2F_Digest 

 

Hashing functions are a very useful tool to have. There are many different types of hashes and each one has tradeoffs for different uses. This can range from error checking, privacy shielding, password protection, forensic analysis, message authentication (HMAC) and much more. See: http://stackoverflow.com/questions/800685/which-cryptographic-hash-function-should-i-choose 

 

- For workflows with data containing existing hashes, being able to consistently create hashes from non-hashed data for comparison is useful.
- Hashes are also useful because they are the same outside the Alteryx environment. They can be used to confirm correct operation of a production system or a third party's external process.

 

Access to only MD5 hashes via MD5_ASCII(String) and MD5_UNICODE(String) found under string functions in the formula tool is a start, but quite limiting. 

 

Further, the ability to use non-cryptographic hashes and checksums would be useful, such as MurmurHash or CRC.  https://en.wikipedia.org/wiki/List_of_hash_functions

Having the implementation benefit from hardware acceleration (AES-NI / CUDA) would be a great plus for high volume applications. 

 

For reference, these are some hash algorithms that could be useful in workflows:

SHA-1

SHA-256

Whirlpool

xxHash

MurmurHash
SpookyHash
CityHash

Checksum
CRC-16
CRC-32
CRC-32 MPEG-2
CRC-64

BLAKE-256
BLAKE-512
BLAKE2s
BLAKE2b
ECOH
FSB
GOST
Grøstl
HAS-160
HAVAL
JH
MD2
MD4
MD6
RadioGatún
RIPEMD
RIPEMD-128
RIPEMD-160
RIPEMD-320
SHA-224
SHA-256
SHA-384
SHA-512
SHA-3 (originally known as Keccak)
Skein
Snefru
Spectral Hash
Streebog
SWIFFT
Tiger

Would be nice to have the option of disabling the append of the "action" to the variable in the summarize tool.  Sometimes it's useful to leave the variable name as is when making tweeks to your module. 

My users need the same functionnality In-db as the in-memory tool "Field Summary".

 

The purpose is to discover the data : distribution, minimum, maximum, count,valid, unique, ...

 

 

Often in larger workflows, I will copy data partway down the stream into a new workflow in order to troubleshoot a small section in order to avoid having to run the workflow over and over again which can take a while. I'm aware (and thankful) of cacheing, but sometimes if there are many parallel streams or, I'd rather just copy the data from the data preview built into the tool so I don't have to take the time to run the workflow again. I'm also aware I could output a yxdb file and use that, but again that takes longer than I would like.

 

The issue I run into is if I copy the data and paste in a text input tool, all the field types change to what they would default to. This is fine with new data, but for data that has specific fields throughout the workflow, this can be a hassle. If copying data could also copy the field type and size that would be great.

It would be super cool to run a regular workflow in "test mode" or some other such way of running it just one tool at a time, so you can check tool outputs along the way and fix issues as they occur, especially for big workflows. Another advantage would be that if, for whatever reason, a working module stops working (maybe someone changed an underlying file - that NEVER happens to me lol), rather than running the whole thing, fixing something, running the whole thing again, you could just fix what's broken and run it that far before continuing.

 

Actually, that gives me an even better idea... a stop/start tool. Drop it in the workflow and the module will run up to that point and stop or start from that point. Hmm... time to submit a second idea!

In addition to the existing functionality, it would be good if the below functionality can also be provided.

 

1) Pattern Analysis

 

This will help profile the data in a better way, help confirm data to a standard/particular pattern, help identify outliers and take necessary corrective action.

 

Sample would be - for emails translating 'abc@gmail.com' to 'nnn@nnnn.nnn', so the outliers might be something were '@' or '.' are not present.
Other example might be phone numbers, 12345-678910 getting translated to 99999-999999, 123-456-78910 getting translated to 999-999-99999, (123)-(456):78910 getting translated to (999)-(999):99999 etc.

 

It would also help to have the Pattern Frequency Distribution alongside.

So from the above example we can see that there are 3 different patterns in which phone numbers exist and hence it might call for relevant standadization rules.


2) More granular control of profiling

 

It would be good, that, in the tool, if the profiling options (like Unique, Histogram, Percentile25 etc) can be selected differently across fields.

 

A sub-idea here might also be to check data against external third party data providers for e.g. USPS Zip validation etc, but it would be meaningful only for selected address fields, hence if there is a granular control to select type of profiling across individual fields it will make sense.

 

Note - When implementing the granular control, would also need to figure out how to put the final report in a more user friendly format as it might not conform to a standard table like definition.

 

3) Uniqueness

 

With on-going importance of identifying duplicates for the purpose of analytic results to be valid, some more uniqueness profiling can be added.

 

For example - Soundex, which is based on how similar/different two things sound.
Distance, which is based on how much traversal is needed to change one value to another, etc.

 

So along side of having Unique counts, we can also have counts if the uniqueness was to factor in Soundex, Distance and other related algorithms.

 

For example if the First Name field is having the following data -

 

Jerry
Jery
Nick
Greg
Gregg

 

The number of Unique records would be 5, where as the number of soundex unique might be only 3 and would open more data exploration opportunities to see if indeed - Jerry/Jery, Greg/Gregg are really the same person/customer etc.

 

4) Custom Rule Conformance

 

I think it would also be good if some functionality similar to multi-row formula can be provided, where we can check conformance to some custom business rules.

 

For e.g. it might be more helpful to check how many Age Units (Days/Months/Year) are blank/null where in related Age Number(1,10,50) etc are populated, rather than having vanila count of null and not null for individual (but related) columns.

 

Thanks,

Rohit

In the histogram tool, I would like the ability to specify the bins, not just the number of bins, but the values of the bins. That would be especially helpful when comparing different data sets when I want to see an apples to apples comparison across two different histograms. 

Think of a pivot table on steroids. In my industry, "strats" are commonly used to summarize pools of investment assets. You may have several commonly used columns that are a mix of sums and weighted averages, capable of having filtering applied to each column. So you may see an output like this:

 

Loan StatusTotal Balance% of Balance% of Balance (in Southwest Region)Loan to Value Ratio (WA)Curr Rate (WA)FICO (WA)Mths Delinquent (WA)
Current$9,000,0009080854.57200
Delinquent$1,000,00010100955.56204
Total$10,000,00010090864.67100.4

 

Right now, I feel like to create the several sums and weighted averages, it's just too inefficient to create all the different modules, link them all together and run them through a transpose and/or cross tab. And to create a summary report where I may have 15 different categories outside of Loan Status, I'd have to replicate that process with those modules 15 times.

 

Currently, I have a different piece of software where I can simply write out sum and WA calcs for each column, save that column list (with accompanying calcs) and then simply plug in a new leftmost category for each piece of data I'm looking at. And I get the Total row as well auto-calculated as well. 

Hello,

It would be great if there was an option to compute 'median' on numerical data column in 'cross-tab' tool. We trust 'median' a lot more than 'average' in many different computations.
I would stretch my suggestion far enough to propose adding quantile computations as well...

Thanks!
Hi All,

Not sure if there is already tool like this, if not, it would help to have a test data generator.

It would be a combination of data type and nature of data. For e.g. Person Name whose meaning is self explanatory and is a string. Similarly phone number which is numeric but would be different than sales amount.
This can help save time during the development and QA phases when real live data might not be available and team would need to mock up such data for testing the code developed.

It may be driven by either some sort public databases like Government provides portals/APIs or internal alteryx maintained dictionary data.

An advanced step to this might be data generated by purpose. Say even inside Person name data would be different when testing a plain use caae vs master data management use case.

Hi there,

We have a relatively large table that we are trying to analyse using the data-investigation tools - however the Field summary tool's interactive output seems to fail on this data set producing no output at all.    It produces no error message - just a blank output on the interactive output (the other two outputs are normally populated).

 

The table is 104 columns wide; 1.16M rows long; and 865 Mb in size excluding indices.

 

We put a random row select on this - and if we passed any more than 13100 rows into the Field Summary tool (with all 107 columns), then the interactive tool output is blank.    If we scale this back to 13000 rows or fewer, the Field summary interactive view works as expected (providing a frequency histogram on each field).

 

Is this a known issue - there was no warning provided to indicate that there was an overrun or anything similar?

Thank you

Sean

 

 

Ref: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Is-there-a-search-in-quot-Choose-Table...

 

With large tables it is tedious to search for a field. It would be a great efficiency gain to allow a user to search for a column in a table by entering a name or partial column name. 

It was discovered that 'Select' transformation is not throwing warning messages for cases where data truncation is happening but relevant warning is being reflected from the 'Formula' transformation. I think it would be good if we can have a consistent logging of warnings/errors for all transformations (at least consistent across the ones based on same use cases - for e.g. when using Alteryx as an ETL tool, 'Select' and 'Formula' tool usage should be common place).

 

Without this in place, it becomes difficult to completely rely on Alteryx in terms of whether in a workflow which is moving/populating data from source to target truncation related errors/warnings would be highlighted in a consistent manner or not. This might lead to additional overhead of having some logic built in to capture such data issues which is again differing transformation by transformation - for e.g when data passes through 'Formula' tool there is no need for custom error/warning logging for truncation but when the same data passes through 'Select' transformation in the workflow it needs to be custom captured.

When working with large amount of data the browse tool profiling causes the program to stop responding.

 

A feature to disable the profiling per browse tool.

or even better

After a set threshold (e.g. amount of rows), the auto profiling is disabled and requires an action to run.

It would be great to have a way to find a column by its name.

This feature would be particularly useful when exploring with data sets having a large number of fields.

This feature would be useful within several tool, in particular with the "Browse" tool and with the "Input" tool.

Ideally the user should be able to  "Ctrl-F" and jump to the column matching the name being typed.

Alternatively, the user could be able to sort the columns alphabetically.

 

Thank you for considering this feature suggestion.

 

Davide Gerbaudo

 

P.S. I understand from this discussion that such feature is not currently available.

It would be extremely helpful to add in a "Count Distinct" function into the summarize tool.  This would help tremendously when working with hierarchical data.

Hi All,

I think this suggestion would be be ideal for the Join tool and it's related cousins (Join Multiple etc.) and would improve the experience of data blending for all users.

I am going to rely on Qlik Sense for this explanation as this functionality is native to that product.

 

When we bring in two data sources and use the join tool to blend we are required to select the field or fields upon which we want to base our join.

In Qlik Sense we can see our two data sources:

 

2 Data Source bubbles.PNG

 

We can then drag them together and it will form suggestions based on data association density:

 

Join.PNG

Join suggestions.PNG

 

This helps with identifying how tables should be joined, and at the very least shows commonalities between data streams, based on the data within the tables and not any naming conventions.

It would be nice to have the functionality to generate suggestions based on association density between two data streams, and then to apply the join from a selection.

 

 Thoughts?

Seeing how we use browse to tools to help build out modules, but they slow down the modules because they write out temp files, it would be awesome to have a "record count" option similar to the input tool. This would allow us to see the data as it's flowing through the module without slowing it down.  Adding a sample tool before every browse would be fairly cumbersome. 

Issue:  How can I return 100,000 rows (results) from Google Analytics.

As per the GA Tool video overview, in the advanced options of the API call, I can set the maxResults to an integer to throttle the API.  However, as per the Google Analytics Core Reporting API reference the maximum number of results returned by one request is 10,000 no matter what it's set to.  After that paging is required by altering the start-index parameter.  How can this be achieved?  Is it possible already?

Google Analytics Core Reporting API references:

max Results

start Index

 

Is there another tool in Alteryx (perhaps a custom tool where I can implement my own Google Analytics API code) to pull as many records as desired?

Top Liked Authors