This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Was very happy to see the Bulk Loader introduced for Snowflake during last release. This bulk loader is specifically available for Snowflake environments that are hosted on AWS, but does not provide functionality for those environments using Azure. As Snowflake continues to build momentum, I imagine this will be a common request. Is there something in the pipeline to add this functionality?
For an interim solution, we will be working toward developing some generic scripts/snowsql to mimic that bulk load, but ultimately we'd love to have this as part of the tool.
One of the common things that we need to do, is to take a delta-copy of a file or a DB table into the staging area of the analytical database.
This always looks very similar - so it would be useful to make this a wizard based process so that teams can easily build these very quickly rather than having to hand wrap:
- Check which primary keys exist - fill the gaps where they don't
- Are there any rows that update over time (or is this insert-only) - if they update over time, which column is the "updated date" column so that we can spot updates - if there is no update date; then we need to do a column by column check of some kind (like a hash or a checksum)
- Do you want to sync deletes?
- Do you want to keep updates?
- Target table in staging area which is now updated compared to the source
- Logging done (similar to what Kimball recommends in the ETL Handbook) with the run date/time; summary stats; and any errors
- Errors table for any errors that arose with row numbers
- Tables in target created (with history table if requested)
With the release of 2018.3, cache has become an adhoc task. With complex workflow and multiple inputs we need a method to cache and save the cache selection by tool. Once the workflow runs after opening, the cache would be saved at the latest tool downstream.
This way we don't have to create adhoc cache steps and run the workflow 2X before realizing the time saving features of cache.
This would work similar to the cache feature in 11.0 but with enhanced functionality...the best of the old cache with the new cache intent.
While In-db tools are very helpful and cut down the time needed to write complex SQL , there are some steps that are faster by directly writing SQL like window functions- OVER (PARTITION BY .....). In Alteryx, we need to create multiple joins and summaries to perform a window function. It would be immensely helpful if there was a SQL editor tool for in-db workflows where we can edit the SQL code at any point in the workflow, or even better, if they can add an "edit" function to every in-db tool where we can customize the SQL code generated and then send to the next tool.
This will cut down the time immensely and streamline the workflow to make Alteryx a true contender for the ETL solution space.
Currently pip is the package manager in place within the Designer. Unfortunately this is something that doesn't fit our requirements as Data Scientists. We prefer using conda due to the following reasons:
condamanages also non-Python library dependencies. This waycondacan beused to manage R packages as well which comes in quite handy (even tough not all packages fromCRANRepository are available)
condaprovides a very simple way of creating conda envs (similar to virtualenv but with conda one can also install and manage pip packages --> virtualenv cannot install conda packages!)toisolate required packages (with specific versions) used in a workflow (e.g. for a Python Tool in Designer).
So I would like to havecondainstead or additionally to pip and would like to createmy condaenvswhere I install the packages I need for a specific task within my workflow. Moreover, if you think about to feature an R jupyternotebook capability (like the Python Tool) it could be beneficial to change from pip tocondafor managing packages in both worlds.
We need some way (unless one exists that I am unaware of - beyond disabling all but the Container I want to run) to fire off containers in particular order. Run Container "Step1" then Run Container "Step2" and so on.
Wanted to control the order of execution of objects in Alteryx WF but right now we have ONLY block until done which is not right choice for so many cases
Can we have a container (say Sequence Container) and put piece of logic in each container and have control by connecting each container? Hope this way we can control the execution order It may be something looks like below
I've seen this question before and have run into it myself. I'd like to see a new tool that would allow a developer (of a workflow) to choose a path of logic based upon criteria known only during the execution of a module.
If LEFT INPUT Count of records < 10,000 THEN Path1 (e.g. use a calgary join)
Per my initial community posting, it seems that in environments where the firewall blocks pip the YXI installation process takes longer than it needs. My experience was 9:15 minutes for a 'simple' custom tool (one dependency wheel included in the YXI).
My 'Idea' is to provide a configuration option to install the YXI files 'offline'. That is, to skip the pip install --upgrade steps, and perhaps specify the --find-links and --no-index options with the pip install -r requirements.txt command. The --no-index option would assume that the developer has included the dependency wheel files in the YXI package. If possible, a second config option to add the path to the dependencies for the --find-links option would help companies that have a central location for storing their dependencies.
Essentially, I want to update a DB table with either an update or with the deletion of rows. I can't delete all of the data. My work around will be to create/insert into a table the keys that i want to delete and try to use a input/output tool with SQL that performs the delete. Any other suggestions are welcome, but a tool is best.
As we do more work analyzng the canvasses that our folk are producing - it's becoming more and more necessary to have a well documented definition and schema for the XML that is used for Alteryx Canvasses.
Please could you publish the full XML definition and schema for Alteryx canvasses - this will allow groups to perform deeper analytics on how people are using Alteryx, automate quality checks; look for learning gaps; scan for dependencies etc?
This got me to think a little more about localized logging options in Alteryx.
At a high level, there are ways to accomplish this in Designer at a User or System level by enabling a Logging directory and then parsing those logs with a separate Alteryx job. However, this would involve logging ALL Designer executions, which seems like it may be overkill for this need. A user can also manually save a log after each execution, although this requires manual intervention.
I think adding an option in the Runtime settings for Workflow Configuration to Enable Logging and (optionally) specify a Logging directory would be a great feature add for Designer. In my opinion this should not apply once a workflow runs on Server (Server logging should be handled in a fully standardized way), but should apply to designer "UI" execution. Having the ability to add a logging naming convention (perhaps including a workflow name and run date in the log name) would be icing on the cake.
This would allow for a piecemeal logging solution to log specific flows or processes that might be high visiblity or high importance, while avoiding saving hundreds or thousands of logs daily of less important processes, and of dev test. It would also reduce or eliminate a manual process to save these logs individually.
A cahce tool would allow a user to temporarily store a snapshot of inline data from previous run of the module.
Imagine a browse tool that was inline as opposed to a terminus tool (input and output). Now allow that browse tool to persist its data after a run of the module. When an option on that tool was activated, it would block all of the dependent tools upstream from it and instead send its cached data downstream.
The reason I think this would be a useful tool is that I often come to the end of creating a module when I'm working on the Reporting tools. I run multiple times to see the changes I've made. When the module has a lot of incoming data and complex data transformations, it can take a long time just to get to the point where the data gets to the reporting tools. This cache tool would eliminate that wait.