Be sure to review our Idea Submission Guidelines for more information!
Submission GuidelinesHello,
After used the new "Image Recognition Tool" a few days, I think you could improve it :
> by adding the dimensional constraints in front of each of the pre-trained models,
> by adding a true tool to divide the training data correctly (in order to have an equivalent number of images for each of the labels)
> at least, allow the tool to use black & white images (I wanted to test it on the MNIST, but the tool tells me that it necessarily needs RGB images) ?
Question : do you in the future allow the user to choose between CPU or GPU usage ?
In any case, thank you again for this new tool, it is certainly perfectible, but very simple to use, and I sincerely think that it will allow a greater number of people to understand the many use cases made possible thanks to image recognition.
Thank you again
Kévin VANCAPPEL (France ;-))
Thank you again.
Kévin VANCAPPEL
it would be great if the formula tool could expand the intellisense to the select column box. For example, I could start typing in the select column box and it would widdle down the list of fields down.let's suppose I wanted to update field 79A, I could type in 7 and it might show something like
7
17
27
37
70
71
79A
79B.
So if I typed in 79 then, it would further reduce it to
79A
79B
And i could select 79A.
As each version of Alteryx is rolled out, it would be much easier for our users and admin team to validate the new version, if Alteryx allowed parallel installs of many different versions of the software.
So - our team is currently on 11.3 - if we could roll out 11.5 in parallel then we could very easily allow users to revert to 11.3 if there are issues, or else remove 11.3 after 2-3 weeks if no issues.
The same goes for versions which are in BETA.
This would be a huge help!
cc: @avinashbonu ; @Deeksha ; @revathi
Many software & hardware companies take a very quantitative approach to driving their product innovation so that they can show an improvement over time on a standard baseline of how the product is used today; and then compare this to the way it can solve the problem in the new version and measure the improvement.
For example:
- Database vendors have been doing this for years using TPC benchmarks (http://www.tpc.org/) where a FIXED set of tasks is agreed as a benchmark and the database vendors then they iterate year over year to improve performance based on these benchmarks
- Graphics card companies or GPU companies have used benchmarks for years (e.g. TimeSpy; Cinebench etc).
How could this translate for Alteryx?
- Every year at Inspire - we hear the stats that say that 90-95% of the time taken is data preparation
- We also know that the reason for buying Alteryx is to reduce the time & skill level required to achieve these outcomes - again, as reenforced by the message that we're driving towards self-service analytics & Citizen-data-analytics.
The dream:
Wouldn't it be great if Alteryx could say: "In the 2019.3 release - we have taken 10% off the benchmark of common tasks as measured by time taken to complete" - and show a 25% reduction year over year in the time to complete this battery of data preparation tasks?
One proposed method:
What would this give Alteryx?
This could be very simple to administer; and if done well it could give Alteryx:
- A clear and unambiguous marketing message that they are super-focussed on solving for the 90-95% of your time that is NOT being spent on analytics, but rather on data prep
- It would also provide focus to drive the platform in the direction of the biggest pain points - all the teams across the platform can then rally around a really deep focus on the user and accelerating their "time from raw data to analytics".
- A competitive differentiation - invite your competitors to take part too just like TPC.org or any of the other benchmarks
What this is / is NOT:
Loads of ways that this could be administered - starting point is to agree to drive this quantitatively on a fixed benchmark of tasks and data
@LDuane ; @stevea ; @jpoz ; @AshleyK ; @AJacobson ; @DerekK ; @Cimmel ; @TuvyL ; @KatieH ; @TomSt ; @AdamR_AYX ; @apolly
Pre-Filter as new option in Input Tool might reduce import data and allows to input only selected data (ie. for specific period or meeting certain conditions).
Cheers,
Pawel
Our company has a need to link a new data source in Athena. We have been able to establish a connection using the input functionality however the connection is so slow it is unusable. We need to have Alteryx build an In Database option for Athena to allow us to link our data lake to Alteryx.
Similar to https://community.alteryx.com/t5/Alteryx-Designer-Ideas/Custom-Functions-in-AMP/idc-p/845446#M16381, it would be great to have AMP allow for custom C++ functions. Custom XML functions were added in 21.1 for AMP, so custom C++ functions would be the natural next step!
cc: @jdunkerley79 @TonyaS
I would love to see horizontal scrolling support for mice. Currently, you allow scrolling vertically, but not horizontally. It would be so much quicker to be able to navigate with horizontal scrolling support.
Please support GZIP files in the input tool for both Designer and Server.
I get several large .gz files every day containing our streaming server logs. I need to parse and import these using Alteryx (we currently use Sawmill). Extracting each of these files would take a huge amount of space and time.
This was previously requested and marked as "now available", but what is available only addressed a small part of the request. First, that request was for both ZIP and GZIP. What is now available is only ZIP. Second, it requested both input and output, what is now available is input only. Third, while not explicitly stated in the request, it needs to function in Alteryx Server in order to be scheduled on a daily basis.
I'm sure there's a reason behind it, but can we please be allowed to run calculations on null values in a formula tool? right now, if we sum three values (1 + 3 + [null]) it produces [null], can the formula tool just ignore the null values? the only way around this is to fill the [null] cells with a value and that adds an additional step to what should be a fairly straight forward process. That value would have to be different for a multiplication formula vs an addition formula in order to not change the answer materially whereas ignoring the value is a more consistent solution.
Hello!
A quite minor, pedantic issue from me today.
Currently, the Oversample Field Tool's naming and configuration suggest that the tool can over sample data:
However, I would argue the tool under samples data instead.
Here are a few sources that explain this much better than I can:
And an image is taken from Medium:
Effectively either step is to create a similar (or same) number of records between each class. Under sampling is the process of taking samples from the majority class, and ending up with a smaller dataset than started with. Over sampling is the process of duplicating records within the minority class, and creates a larger dataset.
When using the Oversample tool within Alteryx, using the example workflow for reference:
When summarizing the input:
And the output:
It's clear that the data has actually been under sampled, in that random samples have been taken from the majority class to match the minority, rather than creating duplicate minority records.
I would suggest a quick renaming of the tool to "Undersample Field Tool", and documentation to not cause confusion to new users of the platform.
Kind Regards,
TheOC
On the UNION tool, allow for deselecting columns that aren't relevant. Leave the union exactly as it is, and you could go into the manual configuration. Align the columns just as you would in the manual configuration. The addition would be that you have the behavior like you see in a join tool where you could deselect C1, C2, C3.... Cx.
Too many times I have a union and there are fields I simply don't even want to bring in, but then have to add a select tool right after in order to remove them.
If the workflow configuration had a run for 'x' number of iterations option it would make debugging macros a lot easier. My current method consists of copying results, changing inputs and repeat until I find my problem which feels very manual.
It can be daunting to find the tool that is currently being processed by the engine in workflows that contain hundreds of tools with many ins, outs, and branches. During runtime, I want to be shown the tool that is running on the canvas. This functionality should be in the form of a button or something to direct focus to that area. It should not be the default.
Added in Alteryx Version 2020.3, the Browse tool no longer shows a profile of the complete dataset (it is capped when the record data size reached 300MB).
My proposed solution is an optional override of the record size limit on the browse tool (which will make the profiling take longer, but actually profile the entire dataset). I would also like a general user setting to set the default behavior of the browse tool to either be limited or unlimited.
Below is the newly included documentation of the Data Profiling Limit, which I'm proposing can be overridden.
Data Profiling Limit
Data Profiling in the Browse tool is capped at 300 MB. This allows you to process very large datasets faster. For each record in the incoming dataset, we process the record and add the record size to a counter. Once the counter reaches 300 MB, we stop processing records.
It is important to note that there is no specific number of records that we can process. This depends on the dataset since a record size can range from 1 byte to a few thousand bytes. This record size is different from the file size, displayed in the Results grid and Data Profiling Holistic View. The file size is generally different since it has been compressed to optimize spacing.
In other words, 300 MB of record size is not the same as 300 MB of file size.
This new tool can cause confusion when looking at the data profile (e.g. if you expect the sum to be $3 million, but the browse tool is only showing 2% of your total records in the profile tool, the profile sum may only show $60 thousand).
The sampled version with a cutoff of 300MB is rarely useful if you are using browse tools to get a quick sense of the variable profiles on medium sized datasets (around 1 million records) since this rarely will fit into the 300MB record size limit.
An example can be shown in the image below, where the dataset contains 855,085 records, but the browse tool is profiling only the first 20,338.
Again, being able to override this 300MB record size limit would fix the problem created in the 2020.3 change to the browse tool.
When using the text mining tools, I have found that the behaviour of using a template only applies to documents with the same page number.
So in my use case I've got a PDF file with 100+ claim statements which are all laid out the same (one page per statement). When setting up the template I used one page to set the annotations, and then input this into the T anchor of the Image to Text tool. Into the D anchor of this tool is my PDF document with 100+ pages. However when examining the output I only get results for page 1.
On examining the JSON for the template I can see that there is reference to the template page number:
And playing around with a generate rows tool and formula to replace the page number with pages 1 - 100 in the JSON doesn't work. I then discovered that if I change the page number on the image input side then I get the desired results.
However an improvement to the tool, as I suspect this is a common use case for the image to text tool, is to add an option in the configuration of the image to text tool to apply the same template to all pages.
I've obviously been doing lots of work with APIs for this to be my second idea posted today which relates to an improved based on recent work with APIs, but I also believe this is wider reaching.
I've been using Alteryx now for over 4 years and always assumed implicit behaviour of the select tool, so would add a select tool as best practice into a workflow after input tools to catch any data type issues. However I discovered that only fields where you either change the data type, length or field name result in that behaviour being configured and subsequently ensured. I discovered this as part of API development where I had an input field which was a string e.g. 01777777. Placing a select tool after this shows this is a string data type, however if the input was changed to 11777777 the select tool changes to a numeric data type. Therefore downstream formulas such as concatenating two strings would fail.
The workaround to this is to change the select tool to string:forced, which is fine when you know about it, but I suspect that a large majority of users don't. Plus if you have something like 2022-01-26 which is recognised initially as a string, then the forced option will be string:forced, however if you wanted it to be date:forced you need to add a first select tool to change to date, and a second select tool to change to string:forced.
Therefore my suggestion is to add a checkbox option in the select tool to Force all field types, which would update the xml of the tool and therefore ensure what I currently assume would be implicit behaviour is actually implemented.
It would be really helpful to have a bulk load 'output' tool to Snowflake. This would be functionality similar to what is available with the Redshift bulk loader.
Currently it takes a reaaally long time to insert via ODBC or would require you to write a custom solution to get this to work.
This article explains the general steps but some of the manual steps outlined would have to be automated to arrive at a solution that is entirely encapsulated within a workflow.
http://insightsthroughdata.com/how-to-load-data-in-bulk-to-snowflake-with-alteryx/
Hi there Alteryx team,
When we load data from raw files into a SQL table - we use this pattern in almost every single loader because the "Update, insert if new" functionality is so slow; it cannot take advantage of SSVB; it does not do deletes; and it doesn't check for changes in the data so your history tables get polluted with updates that are not real updates.
This pattern below addresses these concerns as follows:
- You explicitly separate out the inserts by comparing to the current table; and use SSVB on the connection - thereby maximizing the speed
- The ones that don't exist - you delete, and allow the history table to keep the history.
- Finally - the rows that exist in both source and target are checked for data changes and only updated if one or more fields have changed.
Given how commonly we have to do this (on almost EVERY data pipe from files into our database) - could we look at making an Incremental Update tool in Alteryx to make this easier? This is a common functionality in other ETL platforms, and this would be a great addition to Alteryx.
| User | Likes Count |
|---|---|
| 7 | |
| 3 | |
| 3 | |
| 2 | |
| 2 |