This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
It would be useful to be able to select a single container (containing a data input) or multiple containers using Shift, and run those and only those.
When building a new element to a larger workflow, I often enter a new Input in a new container, the ability to run just that container without having to turn off all my other containers would be really useful in speeding up the start of joining things together.
At the moment, I have a lovely formatted XLS with corporate branding, logos, filled cells, borders etc. The data from the Alteryx output needs to start in cell B6. I have tried the output tools to this named range, but Alteryx destroys all the Excel formatted cells in the data block.
As a workaround on the forums, many Alteryx users pump out to a hidden "Output" tab, and then code =OutputA1 in the formatted sheet. This looks messy to the users who then go hunting for the hidden tab. Personally I end up pumping the workflow out to a temporary CSV file. Then opening that in Excel, selecting all, and then pasting values in the pretty Excel file.
This is fine for one file, but I need to split the output report block by a country field and do this 100s of time for each month end.
Please can we have a output tool that does the same as my workaround. Outputs directly from a workflow to a range in Excel that doesnt destroy the workbook's formatting.
Alteryx does not currently have to email tool that is configurable to use SMTP Authentication for Microsoft Office 365 or any server requiring authentication. Our office printer can authenticate over SMTP and with TLS enabled why not my Alteryx mail tool - 'mic drop!'.
Further explained, Alteryx is a tool that needs to live within abide by the policies and security standards in the organization not vice versa. Therefore, it shouldn't be a big surprise, or a big ask for that matter, that a mail client should have the ability to authenticate prior to sending email of SMTP. I'm very surprised this tool is so arcane. Please implement quickly. Thank you
Many software & hardware companies take a very quantitative approach to driving their product innovation so that they can show an improvement over time on a standard baseline of how the product is used today; and then compare this to the way it can solve the problem in the new version and measure the improvement.
- Database vendors have been doing this for years using TPC benchmarks (http://www.tpc.org/) where a FIXED set of tasks is agreed as a benchmark and the database vendors then they iterate year over year to improve performance based on these benchmarks
- Graphics card companies or GPU companies have used benchmarks for years (e.g. TimeSpy; Cinebench etc).
How could this translate for Alteryx?
- Every year at Inspire - we hear the stats that say that 90-95% of the time taken is data preparation
- We also know that the reason for buying Alteryx is to reduce the time & skill level required to achieve these outcomes - again, as reenforced by the message that we're driving towards self-service analytics & Citizen-data-analytics.
Wouldn't it be great if Alteryx could say: "In the 2019.3 release - we have taken 10% off the benchmark of common tasks as measured by time taken to complete" - and show a 25% reduction year over year in the time to complete this battery of data preparation tasks?
One proposed method:
Take an agreed benchmark set of tasks / data / problems / outcomes, based on a standard data set - these should include all of the common data preparation problems that people face like date normalization; joining; filtering; table sync (incremental sync as well as dump-and-load); etc.
Measure the time it takes users to complete these data-prep/ data movement/ data cleanup tasks on the benchmark data & problem set using the latest innovations and tools
This time then becomes the measure - if it takes an average user 20 mins to complete these data prep tasks today; and in the 2019.3 release it takes 18 mins, then we've taken 10% off the cost of the largest piece of the data analytics pipeline.
What would this give Alteryx?
This could be very simple to administer; and if done well it could give Alteryx:
- A clear and unambiguous marketing message that they are super-focussed on solving for the 90-95% of your time that is NOT being spent on analytics, but rather on data prep
- It would also provide focus to drive the platform in the direction of the biggest pain points - all the teams across the platform can then rally around a really deep focus on the user and accelerating their "time from raw data to analytics".
- A competitive differentiation - invite your competitors to take part too just like TPC.org or any of the other benchmarks
What this is / is NOT:
This is not a run-time measure - i.e. this is not measuring transactions or rows per second
This should be focussed on "Given this problem; and raw data - what is the time it takes you, and the number of clicks and mouse moves etc - to get to the point where you can take raw data, and get it prepped and clean enough to do the analysis".
This should NOT be a test of "Once you've got clean data - how quickly can you do machine learning; or decision trees; or predictive analytics" - as we have said above, that is not the big problem - the big problem is the 90-95% of the time which is spent on data prep / transport / and cleanup.
Loads of ways that this could be administered - starting point is to agree to drive this quantitatively on a fixed benchmark of tasks and data
When the Python Tool operates, it seems to always ingest all the data before processing any of it (i.e. no batch processing). Python can handle this type of functionality with generators, can we update the tool so that it may do some preprocessing (like imports and data prep) and allow a defined generator function to be called repeatedly from a separate input handle and provide batch data frames on output for more parallel-like processing of data?
The Python Tool could be updated as such:
Multi-Input - Same functionality as now, and also allow this data to be used for preprocessing and setting up the Python functions and a single batch function.
Data Input - Ingests data in batches (as most other tools operate) where each batch passes in a dataframe (in this case, a subset of processed entries) into an existing Python function (with a name that is in globals()), and returns another dataframe with that desired output. This can give the option of adding/removing rows as necessary to a subset of the data.
Data Output - Partial set of data after data processing to allow tools further in the chain to process in parallel.
"On Complete" Multi-Outputs - Same functionality as now, to pass process-complete data to the next tool once all data ingested has been processed. Perhaps give the option to pass the complete set from Data Output.
A simple use-case, if a user wanted to use only the Python Tool:
Let's say a user wants to get all URLs from every post in a thread (containing millions of posts) that are in blacklisted domains.
Data prep that sends the list of blacklisted domains into the Python Tool's Multi-Input handle, and that data is transformed and stored in a set within the Python tool once.
A series of posts (strings) are sent in batches (let's say ~10000) to the Data Input of the Python Tool. The tool calls a defined Python function that extracts all the URLs, and filters those in the blacklist.
That data is then transformed into a DataFrame which is then sent to the Data Output of the Python Tool, and only contains results corresponding to the small batch of posts that were ingested. Alteryx can also use this to track progress during execution.
Once all posts have been processed, one of the Python Tool's Multi-Outputs can return a total count of URLs found that were NOT in the blacklist (sure this can be a part of the Data Output, but just for the sake of this example). Could also be used to trigger "on-complete events."
I know I used the term "generators" above, and the design could probably be simplified to instead call an Alteryx Python function that yields from a function to await input from the next batch to use actual Python generators. However, I feel my initial approach could be thought of as a simpler process since generators are more of an intermediate functionality.
I hope this makes sense and is elaborate enough to pursue. Thanks for the consideration!
I would love to be able to have an interface tool that allows a user to search through drop down values (when there are more than 100 or so) similar to autocomplete. It would be helpful as a multiselect or single select drop down. I have inserted a very poorly mocked up picture below. It would essentially be a modified version of the drop down as all the values would be in the tool, but the user could type to find what they are looking for.
Please can you fix the copy and paste of renames across field. It's a behavior that I see in many tool's grids and drives me mad. Its not just select.
Take the attached screen shot. In the select tool, i've renamed "test 2" to "rename2". Fine it works. No issue.
I then copy rename2 and paste into the test3 field, and it copys the entire row's data (and metadata) into that little box, tabs, spaces the lot. I end up with something like the screenshot. Really not sure it was meant to be designed this way, as I cant really see the point.
We need color coding in the SQL Editor Window for input tools. We are always having to pull our code out of there and copy it into a Teradata window so it is easier to ready/trouble shoot. This would save us some time and some hassle and would improve the Alteryx user experience. ( I think you've used a couple of my ideas already. This one is a good one too. )
Who needs a 1073741823 sized string anyways? No one, or close enough to no one. But, if you are creating some fancy new properties in the formula tool and just cranking along and then you see that your **bleep** data stream is 9G for nine rows of data you find yourself wondering what the hell is going on. And then, you walk your way way down the workflow for a while finding slots where the default 1073741823 value got set, changing them to non-insane sized strings, and the your data flow is more like 64kb and your workflow runs in 3 seconds instead of 30 seconds.
Please set the default value for formula tools to a non-insane value that won't be changed by default by 99.99999% of use cases. Thank you.
Alteryx should raise a Conversion Error if re-sizing of a string field in a Select tool results in data truncation. It does this for integers but if a string is truncated there is no indication of this in the workflow output.
Often as I am scraping web sites, some clever developer has put an invisible character (ASCII or Unicode) in the data which causes terrible trouble.
I've identified 89 instances of zero-width or non-zero-width glyphs that are not visible and/or Alteryx does not classify as whitespace. There are probably more, but Unicode is big y'all.
Unfortunately, the Trim() string function only removes 4 of these characters (Tab, Newline, Carriage Feed, and Space). REGEX_REPLACE with the \s option (which is what the Cleanse macro uses) is a little better but still only removes 20. And it removes all instances, not just leading and trailing.
Love the new updates to the Browse tool in 2019.2! However, if you choose the option Open results in new window, which I do often so I can see my whole dataset, the search/filter/sort functionality goes away. Would be great if that new functionality also worked in the new window. Thanks!
Sometimes I want to test portions of a workflow, independent of other portions. I find myself adding containers, just so I can disable some of the time consuming portions that are not part of my test. It would be nice to be able to enable/disable any portion of a workflow, on the fly. Or maybe just disable/enable any connection with a right-click.
Using other data viz tools like Tableau, we often plot yearly timeseries of data onto the same line chart so we can quickly compare year-on-year differences. All data viz tools seem to have complexities but the logical approach is the same. What you do is map all the years data to a relative year, i.e. this year, and then give each year it's own title. See the example below snipped from a Tableau dashboard:
In this example 7 years of data have been plotted on the same chart. Note the x-axis, In Tableau we are able to format the X-Axis labels to only show month and day (Mon-D). This removes the common relative year, i.e 2019.
As expected, Alteryx is awesome at preparing data to do this kind of thing. Using the interactive charting tool you can build really nice charts. However there is currently no way to format the X-Axis label, you must show the relative year too, as shown in the picture below (snipped from the browse tool, outputted from the interactive chart tool):
It was really easy to prepare the 5 year min, max and average lines, which is almost impossible to do in Tableau!
My idea in a nutshell is, please change the interactive chart tool so that the labels on the axis can be formatted to the user's choice, i.e. in this case formatted from datetime to "%B-%d".
Please note, the workflow i'm building in this case, is creating 3 line charts of related data, each by year. The end product is a daily email sent to users.
When upgrading to 2019.1, the content of my Python tool was deleted. Although this may be a bug in the 2019.1 version, or just a bug in the upgrade process. Either way, it is problematic that details of a canvas would be deleted at all.
My guess is if the content of the Python Tool could be reliably stored in the canvas XML this issue could potentially be resolved.
Bring back the Cache checkbox for Input tools. It's cool that we can cache individual tools in 2018.4.
The catch is that for every cache point I have to run the entire workflow. With large workflows that can take a considerable amount of time and hinders development. Because I have to run the workflow over and over just to cache all my data.
Add the cache checkbox back for input tools to make the software more user friendly.