This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Environment variables act as a shortcut so that different computers can be configured in different ways, but a particular path will still point to the right place.
For example if you open up explorer and go to %TEMP%\ - you will open up whichever folder is set up as Temp on this machine. This is super useful so that you can use a particular logical folder without knowing the actual placement on every machine (for example the Windows Directory)
This works partially in the Directory / input - when you put in the environment variable, it is able to search possible subdirectories (screenshot 1) but it does not work once you run the workflow (screenshot 2).
It seems as if the designer hits the Windows API directly, but it does not work within the engine.
Please could you alter the engine to be able to make full use of the environment variables on the machine in question in the directory path or input tool path?
With an increasing number of different projects, involving different machine learning models, it's becoming difficult to manage different package versions across workflows. Currently, the Python tool has a single virtual environment, so we need to develop models in different projects always using the same Python and package versions as the Python tool venv. While this doesn't bother the code itself too much, it becomes a problem as soon as we store and load pickled models, which are sensitive to even minor changes in packages.
This is even more so a problem when we are working on the Alteryx server, where different teams might use different packages. Currently, there is only the server admin who can install packages on the server and there can only be one version per package.
So, a more robust venv management in the Python tool would be much appreciated!
I know cache-related ideas have already been posted (cache macros; cache tools), but I would like it if cache were simply built into every tool, similar to the way it is on the Input Tool.
During workflow development, I'll run the workflow repeatedly, and especially if there is sizeable data or an R tool involved, it can get really time consuming.
I can see where managing cache could be tricky: in a large workflow processing a lot of data, nobody would want to maintain dozens of copies of that data. But there may be ways of just monitoring changes to the workflow in order to know if something needs to be rebuilt or not: e.g. suppose I cache a Predictive Tool, and then make no changes to any tool preceeding it in the workflow... the next time I run, the engine should be able to look at "cache flags" and/or "modified tool flags" to determine where it should start: basically start at the "furthest along cache" that has no "modified tools" preceeding it.
I would like to see Global Variable being made available in Alteryx. I have seen the Global Constant being made available under Workflow "User" configuration. But this is constant and needs to be defined at Design time.
How about a Process Id that needs to be auto genearted and the same needs to be available across the formula tools used with in the workflow.
Transfer of records from Python SDK RecordRef seems to be slow sending large amounts of data to the Alteryx Engine (e.g. discussion here). Although unclear of the exact specifics, it seems that there's a copy and convert process in play.
Apache Arrow appears to be addressing this issue, and the roadmap and specs are impressive! It seems like (again I have no understanding of the Alteryx Engine specifics) that something like this would be excellent for expanding SDK use cases as well as for other connectors such as the Apache Spark connector.
And it looks like it'd be fun to build into Alteryx! 🙂
It appears that the Workflow Dependencies window does not report dependencies from all tools. In the example image, you can see that the file input from the Amazon S3 Download tool is not listed. Some tools may have dependencies that do not easily fit the current field structure of the window, but maybe the input/download tools could be listed with an asterisk or partial reference.
In order to perform audit-trail logging - it would be valuable to have 2 new capabilities
a) environment variables which show the workflow name; filepath; version; run start date and time; etc. For any worklows we build, we need to have a solid audit trail to be SOX compliant, so having this detail available as a data field to write and manipulate is essential
b) A logging component. What would be great is a component that you can drop on a workflow, not connected to anything, which is able to trap the start; end; runtime; version; etc of a workflow; and commit this to any output data format (CSV or ODBC etc). This logging tool would need to be able to capture the full runtime, so it would need to be the last thing that runs (which means it may need to exist in parallel to the main workflow in some way). This is not currently possible with a complex workflow with outputs, because it's not possible to identify when the entire workflow ended; or the runtime (since output tools don't have an onward connector to pass flow-of-control to catch the final end-time)
Again, both of these are necessary to meet audit requirements for workflows and prodcution-quality ETLs for BI data warehouses.
A lot of popular machine learning systems use a computer's GPU to speed up some of the math to a huge degree. The header on this article on Medium shows a 15x difference from a high-end CPU vs a high-end GPU. It could also create an improvement in the spatial tools. Perhaps Alteryx should add this functionality in order to speed up these tools, which I can imagine are currently some of the slowest.
If progress bar or overall process completion percentage can be displayed somewhere, adds great value for the users running complex processes (with multiple databases/ files as input and complex queries especially spatial queries).
When working through a question with our team on how Excel & MS SQL represent dates, we did a quick test and confirmed that SQL and Excel are both storing dates & date-times as a number (technically the offset from a fixed date) which really helps for things like BI applications where a fact table may store a very large number of dates on each record (entered date/time; updated date/time; transaction date/time; etc)
However, when we look at the same in Alteryx, it seems to be storing these dates as plain text (see screenshot below) - meaning that instead of an 8 byte field for every date and datetime; which can be compressed using offset logic like in Parquet, these appear to be represented as a 19 byte field for date-time.
Would it make sense to change the internal representation to a number to make date-offsetting and processing easier (all date-logic then becomes simple addition / subtraction instead of string manipulation)?
Note: You can see this in the screenshot below. the date field has 10 bytes; and the date-time has 19 bytes (where both of these are stored and represented in MSSQL in 8 bytes in total)
Ability to run. Workflow from failed tool onwards .
If a workflow has 10 tools , if some tools failed with error(at tool5) , in an etl world we don’t want to run it again from beginning, instead we fix the tool (tool5)that had error and run from that tool and finish workflow .tool 5 to tool10.
Please evaluate the option to add 2 new containers:
1. parallel - execute tasks inside in parallel 2. serial - execute tasks in strict order, imposed at design time. In the future the oder of operations could be enforced by parameters or other input conditions at runtime.
Please Give us the capacity to mix and match these 2 containers.
Like many of you, I have a lot of modules and macros ... and growing. I keep them fairly organized in different folders and subfolders but sometimes I can't find that particular module I was working on weeks ago.... and I need to get it now. Now I end up doing an advanced search in windows explorer by date and maybe looking for certain keywords. It would be nice to keep track of them in alteryx - add tags, customer names. depts to modules (meta info tab)? Maybe a special container/gui with time line would read the meta info tab so you can more easily find that one module/macro
Also, another gui containing tool name tags so you can easily find all module that use that one tool you're looking for
I just noticed in a workflow I'm looking at, that I derived a column but after a bit of developing, forgot about it, so there it sat, unused. It doesn't hurt anything, but it would be useful if that sort of thing would automatically generate a soft warning on the tool in question: e.g. any item not referenced downstream automatically generates an "Unused variable" warning.