This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
This idea arose recently when working specifically with the Association Analysis tool, but I have a feeling that other predictive tools could benefit as well. I was trying to run an association analysis for a large number of variables, but when I was investigating the output using the new interactive tools, I was presented with something similar to this:
While the correlation plot draws your high to high associations, the user is unable to read the field names, and the tooltip only provides the correlation value rather than the fields with the value. As such, I shifted my attention to the report output, which looked like this:
While I could now read everything, it made pulling out the insights much more difficult. Wanting the best of both worlds, I decided to extract the correlation table from the R output and drop it into Tableau for a filterable, interactive version of the correlation matrix. This turned out to be much easier said than done. Because the R output comes in report form, I tried to use the report extract macros mentioned in this thread to pull out the actual values. This was an issue due to the report formatting, so instead I cracked open the macro to extract the data directly from the R output. To make a long story shorter, this ended up being problematic due to report formats, batch macro pathing, and an unidentifiable bug.
In the end, it would be great if there was a “Data” output for reports from certain predictive tools that would benefit from further analysis. While the reports and interactive outputs are great for ingesting small model outputs, at times there is a need to extract the data itself for further analysis/visualization. This is one example, as is the model coefficients from regression analyses that I have used in the past. I know Dr. Dan created a model coefficients macro for the case of regression, but I have to imagine that there are other cases where the data is desired along with the report/interactive output.
Python pandas dataframes and data types (numpy arrays, lists, dictionaries, etc.) are much more robust in general than their counterparts in R, and they play together much easier as well. Moreover, there are only a handful of packages that do everything a data scientist would need, including graphing, such as SciKit Learn, Pandas, Numpy, and Seaborn. After utliizing R, Python, and Alteryx, I'm still a big proponent of integrating with the Python language much like Alteryx has integrated with R. At the very least, I propose to create the ability to create custom code such as a Python tool.
I wasted a good old chunk of time dealing with non-breaking spaces, and Alteryx could be improved by handling this automatically.
A space is a space, right? Nope, there are spaces (ASCII value decimal 32) and there are non-breaking spaces (ASCII value decimal 160). They look the same, but have slightly different behaviour in certain circumstances, like when text is auto-wrapped.
The DataCleansing tool cleans spaces, but leaves non-breaking spaces.
The Data Grid puts a warning on cells with leading or trailing spaces, but remains silent for non-breaking spaces.
I was trying to match two strings, that looked identical. I had DataCleansed my cells, and the grid was showing me nothing wrong with the data. In desperation, I copied the two data cells that I expected to match to a text editor (Textpad), and then examined the binary ASCII values of the data. One cell had a trailing non-breaking space, and that caused the failure to match.
This was hard to find. For someone less hopelessly nerdy, it would be practically impossible.
As a small change, it might be really useful for Alteryx to include non-breaking spaces in it's definition of "space", such that DataCleansing tool removes it, and the Data Grid flags up the cell as having a leading or trailing space.
You could pick up non-breaking spaces from HTML, or from Excel. I think mine came from a SQL script but I am not sure how it was there. They are out there, and they will bite.
One of the tools that I use the most is the SELECT tool because I normally get large data sets with fields that I won't be using for a specific analysis or with fields that need re-naming. In the same way, sometimes Alteryx will mark a field in a different type than the one I need (e.g. date field as string). That's when the SELECT comes in handy.
However, often times when dealing with multiple sources and having many SELECT tools on your canvas can make the workflow look a little "crowded". Not to mention adding extra tools that will need later explanation when presenting/sharing your canvas with others. That is why my suggestion is to give the CONNECTION tool "more power" by offering some of the functionality found in the SELECT tool.
For instance, if one of the most used features of the SELECT tool is to choose the fields that will move through the workflow, then may be we can make that feature available in the CONNECTION tool. Similarly, if one of the most used features (by Alteryx users) is to re-name fields or change the field type, then may be we can make that available in the CONNECTION tool as well.
At the end, developers can benefit from speeding up workflow development processes and end-users will benefit by having cleaner workflows presented to them, which always help to get the message across.
What do you guys think? Any of you feel the same? Leave your comments below.
A question has been coming up from several users at my workplace about allowing a column description to display in the Visual Query Builder instead of or along with the column name.
The column names in our database are based on an older naming convention, and sometimes the names aren't that easy to understand. We do see that (if a column does have a column description in metadata) it shows when hovering over the particular column; however, the consensus is that we'd like to reverse this and have the column description displayed with the column name shown on hover.
It would be a huge increase to efficiency and workflow development if this could be implemented.
Not sure if there is already tool like this, if not, it would help to have a test data generator.
It would be a combination of data type and nature of data. For e.g. Person Name whose meaning is self explanatory and is a string. Similarly phone number which is numeric but would be different than sales amount. This can help save time during the development and QA phases when real live data might not be available and team would need to mock up such data for testing the code developed.
It may be driven by either some sort public databases like Government provides portals/APIs or internal alteryx maintained dictionary data.
An advanced step to this might be data generated by purpose. Say even inside Person name data would be different when testing a plain use caae vs master data management use case.
Access to only MD5 hashes via MD5_ASCII(String) and MD5_UNICODE(String) found under string functions is limiting. Is there a way to access other hashing algorithms, ideally via the crypto algorithms from OpenSSL or the .NET framework?
- For workflows with data containing existing hashes, being able to consistently create hashes from non-hashed data for comparison is useful. - Hashes are also useful because they are the same outside the Alteryx environment. They can be used to confirm correct operation of a production system or a third party's external process.
Access to only MD5 hashes via MD5_ASCII(String) and MD5_UNICODE(String) found under string functions in the formula tool is a start, but quite limiting.
Designer should support statistical testing tools that ignore data distribution and support Statistical Learning methods.
Alteryx already supports resampling for predictive modeling with Cross-Validation.
Resampling tools for bootstrap and permutation tests (supporting with or without replacement) should be tools for analysts and data scientists alike that assess random variability in a statistic without needing to worry about the restrictions of the data's distribution, as is the case with many parametric tests, most commonly supported by the t-test Tool in Alteryx. With modern computing power the need for hundred-year-old statistical sampling testing is fading: the power to sample a data set thousands of times to compare results to random chance is much easier today.
The tool's results could include, like R, outputs of not only the results histogram but the associated Q-Q plot that visualizes the distribution of the data for the analyst. This would duplicate the Distribution Analysis tool somewhat, but the Q-Q plot is, to me, a major missing element in the simplest visualization of data. This tool could be very valuable in terms of feeding the A/B Test tools.
Seems there is inconsistency across tool when it comes to being case sensitive.
One simple example might be - For Join it works based on case sensitivity, but for Formula (Multi Row) - it works based on case insensitivity.
The Idea is to make the functionality case sensitive for all tools.
1) When some code is written in Alteryx, it works same way when coming to case consistency, and does not differ tool-by-tool.
2) It is common practice to make use of functions (Uppercase/Lowercase) to ignore differences coming out of case sensitivity for items like joining with some standardized values (like List of Values etc), so it should not be much of an overhead. At times the case related differences might be something which needs to be treated differently.
3) If a tool ignores case sensitivity, there might not be an easy way (if not - no way available) to make it work for case sensitivity as per business requirement, but vice versa is attainable making use of Uppercase/Lowercase case conversion functions.
This feature isn't a must - but would definitely be a nice to have.
Similar to the excel having a tab with key figures like average, count and sum
It would be a really good idea to do something similar within Alteryx just to have a quick glance on key figures/functions (example attached - apologise for the bad paint job but definitely would look good with Alteryx colour scheme)
In addition to the existing functionality, it would be good if the below functionality can also be provided.
1) Pattern Analysis
This will help profile the data in a better way, help confirm data to a standard/particular pattern, help identify outliers and take necessary corrective action.
Sample would be - for emails translating 'email@example.com' to 'firstname.lastname@example.org', so the outliers might be something were '@' or '.' are not present. Other example might be phone numbers, 12345-678910 getting translated to 99999-999999, 123-456-78910 getting translated to 999-999-99999, (123)-(456):78910 getting translated to (999)-(999):99999 etc.
It would also help to have the Pattern Frequency Distribution alongside.
So from the above example we can see that there are 3 different patterns in which phone numbers exist and hence it might call for relevant standadization rules.
2) More granular control of profiling
It would be good, that, in the tool, if the profiling options (like Unique, Histogram, Percentile25 etc) can be selected differently across fields.
A sub-idea here might also be to check data against external third party data providers for e.g. USPS Zip validation etc, but it would be meaningful only for selected address fields, hence if there is a granular control to select type of profiling across individual fields it will make sense.
Note - When implementing the granular control, would also need to figure out how to put the final report in a more user friendly format as it might not conform to a standard table like definition.
With on-going importance of identifying duplicates for the purpose of analytic results to be valid, some more uniqueness profiling can be added.
For example - Soundex, which is based on how similar/different two things sound. Distance, which is based on how much traversal is needed to change one value to another, etc.
So along side of having Unique counts, we can also have counts if the uniqueness was to factor in Soundex, Distance and other related algorithms.
For example if the First Name field is having the following data -
Jerry Jery Nick Greg Gregg
The number of Unique records would be 5, where as the number of soundex unique might be only 3 and would open more data exploration opportunities to see if indeed - Jerry/Jery, Greg/Gregg are really the same person/customer etc.
4) Custom Rule Conformance
I think it would also be good if some functionality similar to multi-row formula can be provided, where we can check conformance to some custom business rules.
For e.g. it might be more helpful to check how many Age Units (Days/Months/Year) are blank/null where in related Age Number(1,10,50) etc are populated, rather than having vanila count of null and not null for individual (but related) columns.
It would be super cool to run a regular workflow in "test mode" or some other such way of running it just one tool at a time, so you can check tool outputs along the way and fix issues as they occur, especially for big workflows. Another advantage would be that if, for whatever reason, a working module stops working (maybe someone changed an underlying file - that NEVER happens to me lol), rather than running the whole thing, fixing something, running the whole thing again, you could just fix what's broken and run it that far before continuing.
Actually, that gives me an even better idea... a stop/start tool. Drop it in the workflow and the module will run up to that point and stop or start from that point. Hmm... time to submit a second idea!
One if the most common data-investigation tasks we have to do is comparing 2 data-sets. This may be making sure the columns are the same, field-name match, or even looking at row data. I think that this would be a tremendous addition to the core toolset. I've made a fairly good start on it, and am more than happy if you want to take this and extend or add to it (i give this freely with no claim on the work).
Very very happy to work with the team to build this out if it's useful
It was discovered that 'Select' transformation is not throwing warning messages for cases where data truncation is happening but relevant warning is being reflected from the 'Formula' transformation. I think it would be good if we can have a consistent logging of warnings/errors for all transformations (at least consistent across the ones based on same use cases - for e.g. when using Alteryx as an ETL tool, 'Select' and 'Formula' tool usage should be common place).
Without this in place, it becomes difficult to completely rely on Alteryx in terms of whether in a workflow which is moving/populating data from source to target truncation related errors/warnings would be highlighted in a consistent manner or not. This might lead to additional overhead of having some logic built in to capture such data issues which is again differing transformation by transformation - for e.g when data passes through 'Formula' tool there is no need for custom error/warning logging for truncation but when the same data passes through 'Select' transformation in the workflow it needs to be custom captured.
Hi alteryx can you please create a poll or an forms to fill or approval processes kind of tools . I know we have some analytics app tools but can we create something like google forms where we can easily create forms and get data outputs. Emails notifications for those forms and approvals .. etc ..