This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
When the Python Tool operates, it seems to always ingest all the data before processing any of it (i.e. no batch processing). Python can handle this type of functionality with generators, can we update the tool so that it may do some preprocessing (like imports and data prep) and allow a defined generator function to be called repeatedly from a separate input handle and provide batch data frames on output for more parallel-like processing of data?
The Python Tool could be updated as such:
Multi-Input - Same functionality as now, and also allow this data to be used for preprocessing and setting up the Python functions and a single batch function.
Data Input - Ingests data in batches (as most other tools operate) where each batch passes in a dataframe (in this case, a subset of processed entries) into an existing Python function (with a name that is in globals()), and returns another dataframe with that desired output. This can give the option of adding/removing rows as necessary to a subset of the data.
Data Output - Partial set of data after data processing to allow tools further in the chain to process in parallel.
"On Complete" Multi-Outputs - Same functionality as now, to pass process-complete data to the next tool once all data ingested has been processed. Perhaps give the option to pass the complete set from Data Output.
A simple use-case, if a user wanted to use only the Python Tool:
Let's say a user wants to get all URLs from every post in a thread (containing millions of posts) that are in blacklisted domains.
Data prep that sends the list of blacklisted domains into the Python Tool's Multi-Input handle, and that data is transformed and stored in a set within the Python tool once.
A series of posts (strings) are sent in batches (let's say ~10000) to the Data Input of the Python Tool. The tool calls a defined Python function that extracts all the URLs, and filters those in the blacklist.
That data is then transformed into a DataFrame which is then sent to the Data Output of the Python Tool, and only contains results corresponding to the small batch of posts that were ingested. Alteryx can also use this to track progress during execution.
Once all posts have been processed, one of the Python Tool's Multi-Outputs can return a total count of URLs found that were NOT in the blacklist (sure this can be a part of the Data Output, but just for the sake of this example). Could also be used to trigger "on-complete events."
I know I used the term "generators" above, and the design could probably be simplified to instead call an Alteryx Python function that yields from a function to await input from the next batch to use actual Python generators. However, I feel my initial approach could be thought of as a simpler process since generators are more of an intermediate functionality.
I hope this makes sense and is elaborate enough to pursue. Thanks for the consideration!
We need color coding in the SQL Editor Window for input tools. We are always having to pull our code out of there and copy it into a Teradata window so it is easier to ready/trouble shoot. This would save us some time and some hassle and would improve the Alteryx user experience. ( I think you've used a couple of my ideas already. This one is a good one too. )
Who needs a 1073741823 sized string anyways? No one, or close enough to no one. But, if you are creating some fancy new properties in the formula tool and just cranking along and then you see that your **bleep** data stream is 9G for nine rows of data you find yourself wondering what the hell is going on. And then, you walk your way way down the workflow for a while finding slots where the default 1073741823 value got set, changing them to non-insane sized strings, and the your data flow is more like 64kb and your workflow runs in 3 seconds instead of 30 seconds.
Please set the default value for formula tools to a non-insane value that won't be changed by default by 99.99999% of use cases. Thank you.
When upgrading to 2019.1, the content of my Python tool was deleted. Although this may be a bug in the 2019.1 version, or just a bug in the upgrade process. Either way, it is problematic that details of a canvas would be deleted at all.
My guess is if the content of the Python Tool could be reliably stored in the canvas XML this issue could potentially be resolved.
With an increasing number of different projects, involving different machine learning models, it's becoming difficult to manage different package versions across workflows. Currently, the Python tool has a single virtual environment, so we need to develop models in different projects always using the same Python and package versions as the Python tool venv. While this doesn't bother the code itself too much, it becomes a problem as soon as we store and load pickled models, which are sensitive to even minor changes in packages.
This is even more so a problem when we are working on the Alteryx server, where different teams might use different packages. Currently, there is only the server admin who can install packages on the server and there can only be one version per package.
So, a more robust venv management in the Python tool would be much appreciated!
At the moment if a part of your python code takes more than 30s to run, Jupyter times out and Alteryx cancels the workflow. This makes the Python Tool unusable for anything intensive and the timeout should be removed by default or be configurable per workflow.
I've made this idea as none of the solutions in these threads feel satisfactory:
Idea: Prompt the user to find a missing macro instead of the current UX of a question mark icon.
Issue: When a macro referenced in a workflow is missing, then there is no way to a) know what the name of the macro was (assuming you were lazy like me and didn't document with a comment) and b) find the macro so you can get back to business.
When this happens to me know, I have to go to the XML view and search for macros and then cycle through them until I find the one that's missing. Then I have to either copy the macro back into that location or manually edit the workflow XML. Not cool man.
Solution: When a macro is missing, the image below at the right should be shown. In the properties window, a file browse tool should allow the user to find the macro.
Currently if one wants to compare different alteryx files or different versions of the same file - one needs to compare the XML files. If you are not very familiar with navigating XML, this poses a risk as one may not be able to identify all changes.
It would be a great addition to Alteryx to integrate Alteryx with Git, Subversion, CVS, Mercurial, and GitHub as this tool is becoming the go-to tool for data processing for data analysts and even programmers.
This additional functionality to compare previous versions (diff) and also to merge alteryx workflows if two people are working on the same workflow, and also to easily see what changes have been committed/ made by other developers and when would make Alteryx a much more powerful tool and would open doors to other types of users, as essentially you can run anything through Alteryx.
How about a quick method of disabling a container.
Current state - Click on the container, pan the mouse all the way over to the tiny checkbox target in the configuration pane and click disable.
Future state - little icon by the rollup icon that can be clicked to disable/enable, differentiated by perhaps a color change of the minimized pane perhaps?
I know what you're thinking, "talk about lazy, he's whining about moving the mouse (which his hand was already on) 2 cm along his desktop and clicking"... but still what an easy usability win and one less click to do a task I find myself repeating frequently.
Idea: I need a function that given two dates, will return the number of business days between them. I need to know the # of business days between when a sales order is placed and when it ships to the customer. I'm in the US, so I would want to not count Saturdays, Sundays, and US Holidays, but I can foresee others wanting the option to change to other calendars or ignore holidays.
There are a couple of posts on this in the community, but everything I've found so far is too laborious to implement or not robust.
Python pandas dataframes and data types (numpy arrays, lists, dictionaries, etc.) are much more robust in general than their counterparts in R, and they play together much easier as well. Moreover, there are only a handful of packages that do everything a data scientist would need, including graphing, such as SciKit Learn, Pandas, Numpy, and Seaborn. After utliizing R, Python, and Alteryx, I'm still a big proponent of integrating with the Python language much like Alteryx has integrated with R. At the very least, I propose to create the ability to create custom code such as a Python tool.
My company does installs through a machine with admin rights, but the end user does not actually have admin rights to the laptop. Therefore, when attempting to add modules into the Developer tool for python - pip install fails. The failure is due to the install being in program files where a non-admin is unable to write, the normal workaround is also not possible since the version used is admin and not non-admi designer.
Can the tool be more flexible from the get go. As the only way out of this is to go through articles regarding SDK and creating custom requirements txt files. My goal was just to be able to use Python with Alteryx and add on modules as I need. Very cool updates in 3.5 I'm using but thought this conundrum might happen to others in same situation. Admin install with non-admin rights. Thanks.
A lot of popular machine learning systems use a computer's GPU to speed up some of the math to a huge degree. The header on this article on Medium shows a 15x difference from a high-end CPU vs a high-end GPU. It could also create an improvement in the spatial tools. Perhaps Alteryx should add this functionality in order to speed up these tools, which I can imagine are currently some of the slowest.
I could invest the time into creating a macro to do what I need, or per @MichaelF suggestion a custom formula. However, the functionality already exists in the Blob Convert tool, so I'm suggesting that Alteryx provides that existing functionality to customers in a Formula.
Can we get a more robust read.Alteryx function for mode="data.frame"? If it is reading the stream as a data frame, can we have the option stringsAsFactors = FALSE?
I am getting tripped up a lot because the code will execute in R Studio, but will get mysterious behaviours when it runs within the R Tool. I am manually converting variables to character strings in my R Tool code which I don't have to do in R Studio. However, I'm not a highly detail oriented R developer, so I will miss variable data type conversions and have spent a lot of time going down the wrong path. Also, It makes it difficult to maintain two different scripts for the same routine.
I have started using the glimpse() function in R Tool code, to help catch some data conversions since it writes the output in the message log.
It would be great if we could set the default size of the window presented to the user upon running an Analytic App. Better yet, the option to also have it be dynamically sized (auto-size to the number of input fields required).
It is important to be able to test for heteroscedasticity, so a tool for this test would be much appreciated.
In addition, I strongly believe the ability to calculate robust standard errors should be included as an option in existing regression tools, where applicable. This is a standard feature in most statistical analysis software packages.