community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Ideas

Share your Designer product ideas - we're listening!

1 Review

Our submission guidelines & status definitions before getting started

2 Search

The community for a solution or existing idea before posting

3 Vote

By clicking the star in the top left corner of an idea you support

4 Submit

A new idea to suggest a product enhancement or new feature


Suggest an idea

From Wikipedia

Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.[1] The name Druid comes from the shapeshifting Druid class in many role-playing games, to reflect the fact that the architecture of the system can shift to solve different types of data problems.

Druid is commonly used in business intelligence/OLAP applications to analyze high volumes of real-time and historical data.[2] Druid is used in production by technology companies such as Alibaba,[2] Airbnb,[2] Cisco,[3] eBay,[4] Netflix,[5] Paypal,[2], Yahoo.[6] and Wikimedia Foundation [7] 

 

More and more companies are going from Hive to Druid for Dataviz needs, maybe it's time to look for Druid Integration with Alteryx?

  • In Database

I reported this to the support team but was told it was by design and to post here.

 

In-DB Inefficient SQL

I would like to report that the In-DB tools are generating horribly inefficient SQL code for simple operations.  It seems no matter what tools you use every statement is starting with a nested 'Select * From'.

 

Example Simple workflow:

  Support1.jpgSupport2.jpg

 

This is a simple Select and Group by but the SQL Generated is:

 

SELECT "ShipTo", "ShipTo_Name", SUM("ECM_3PL_OVERHEADS_Unit") AS "Sum_ECM_3PL_OVERHEADS_Unit"

FROM (SELECT * FROM "_SYS_BIC"."shell.app.gsap.FL000_LSC.FL002_CTS.INT.RPT/CA_CTS_RPT_MAIN_001") AS "a"

GROUP BY "ShipTo", "ShipTo_Name"

 

This is taking a very long time to execute:

 

Statement 'SELECT "ShipTo", "ShipTo_Name", SUM("ECM_3PL_OVERHEADS_Unit") AS "Sum_ECM_3PL_OVERHEADS_Unit" FROM ...'

successfully executed in 15.752 seconds  (server processing time: 15.699 seconds)

 

Whereas if I take the same query and remove the nested Select *:

 

SELECT "ShipTo", "ShipTo_Name", SUM("ECM_3PL_OVERHEADS_Unit") AS "Sum_ECM_3PL_OVERHEADS_Unit"

FROM "_SYS_BIC"."shell.app.gsap.FL000_LSC.FL002_CTS.INT.RPT/CA_CTS_RPT_MAIN_001" AS "a"

GROUP BY "ShipTo", "ShipTo_Name"

 

It is very quick:

 

Statement 'SELECT "ShipTo", "ShipTo_Name", SUM("ECM_3PL_OVERHEADS_Unit") AS "Sum_ECM_3PL_OVERHEADS_Unit" FROM ...'

successfully executed in 1.211 seconds  (server processing time: 1.157 seconds)

 

So Alteryx is generating queries up to x13 slower than they should be thereby defeating the point of using In-DB.  As you can imagine in a workflow where we have multiple Connect In-DB tools this is a really substantial amount of time.  Example used above is from SAP HANA DB has 1.9m rows and ~90 columns but we have much bigger tables/views than this.

 

If you look you will see its same behaviour for all In-DB tools where each tool creates another nested Select with its particular operator.

 

MY SUGGESTION:

So my suggestion is that Alteryx should combine the SQL of the first few tools and avoid using SELECT * completely unless no Select tools have been used.  So it should combine:

- Connect In-DB + Select

- Connect In-DB + Filter

- Connect In-DB + Summarise

 

Preferably it should combine/flatten everything up until the first join or union.  But Select + Filter are a must!

 

Note it seems some DB's can cope OK with un-nesting these big nested queries in the query plans for some Tables but normally not for Views.  But some cannot cope at all and so the In-DB tools cannot even be used to Browse 100 records (due to select *).

  • In Database

As simple as the title :

 

Just a Multi-Field Formula in-db. It's a nightmare to write sometime 50 or 100 times the same SQL formula and then maintain it.

 

Please.

 

Here is a téléchargement.jpg

Spark ODBC is really faster than Hive, at least for GUI.

However, two things are missing :
1/ Append existing for the write date (exists his way on Hive)

 

2/ability to put "overwrite" even if the table does not exist (it works this way on Hive)

 

test_spark_sql.png

 

These two drawbacks limit severly th

  • In Database

The Transpose In-db stands in the "Laboraty" for years now. I understand Alteryx invested some time and money to develop that but sadly we still can't use that tool for sensitive workflows. Did you get some bugs on it? Can you please correct it and make this tool an "official" tool?

 

Thanks

  • In Database

I would like to suggest creating a fix to allow In-DB Connect tool's custom SQL to read Common Table Expressions. As of 2018.2, the SQL fails due to the fact that In-DB tools wrap everything in a select * statement. Since CTE's need to start with With, this causes the SQL to error out. This would be a huge help instead of having to write nested sub selects in a long, complex SQL code!

  • In Database

DELETE from Source_Data Where ID in

SELECT ID from My_Temp_Table where FLAG = 'Y'

 

.... 

 

Essentially, I want to update a DB table with either an update or with the deletion of rows.  I can't delete all of the data.  My work around will be to create/insert into a table the keys that i want to delete and try to use a input/output tool with SQL that performs the delete.  Any other suggestions are welcome, but a tool is best.

 

Thanks,

Mark

I am pulling DB2 data into a workflow and I have a lot of joins that are slowing it down a lot.  For the SQL stuff I am doing, I can use in-db to fix this issue, but with DB2, in-db is not yet supported so the joins take a lot of time to all be pulled together.  I also think that just focusing on the in-DB tools in general would be a good thing.  These tools really can be game changers when it comes to large datasets.  

Hi,

 

Carlson Companies is moving to a Vertica environment and it would be great if that was supported with the In-database tools. That would definitely help and expand the use of Alteryx at our company!

 

Thanks,

 

Tyler Mittelstadt

  • In Database

Add in-database tools for SAP HANA.

Please star that idea so we can prioritize this request accordingly

  • In Database

It would be really useful to be able to obtain the user name of some one running an app in the Gallery. This could be used for instance in row level security for people running an app that produces a report and that data is considered sensitive

There is a need when visualizing in-Database workflows to be able to visualize sorted data. This sorting could be done 1 of 2 ways: In a browse tool, or as a stand-alone Sort tool. Either would address the need. Without such a tool being present, the only way to sort the data is to "Data Stream Out" and then visualize the data in Alteryx. However, this process violates the premise of the usefulness of the in-DB toolkit, which is to keep your data in-DB and process using the DB engine. Streaming out big data in order to add a sort is not efficient.

 

Granted, the in-DB processing doesn't care whether data is sorted or not. However, when attempting to find extreme values after an aggregation, or when trying to identify something as simple as whether null values are present in a field, then a sort becomes extremely useful, and a necessary tool for human consumption of data (regardless of the database's processing needs).

 

Thanks very much for hearing my idea!

Currently in-database processing is not available for Google Big Query. It would be great if that could be developed too.

 

Regards,

 

Hans

When converting data types while In-DB, it would be really helpful if I could change the data type with the "Select In-DB" tool in a similar manner to the "Select" tool. Currently, we are having to use the "Formula In-DB" tool in order to create a "Cast" Statement.

  • In Database

In-database enables large performance benefits on big datasets, it would be great to incorporate multirow and multifield formulas for use within the in-database funcions for redshift

Would be nice if Alteryx had the ability to run a Teradata stored procedure and/or macro with a the ability to accept input parameters.  Appears this ability exists for MS SQL Server.  Seems odd that I can issue a SQL statement to the database via a pre or post processing command on an input or output, but can't call a stored procedure or execute a macro.  Only way we can seem to call a stored procedure is by creating a Teradata BTEQ script and using the Run Command tool to execute that script.  Works, but a bit messy and doesn't quite fit the no-coding them of Alteryx.

As a security enhancement, the default passwords setting should be encrypt for user. Although this is critical for security my users have overlooked this even with training. They truly aren't culpable if they forgot. If it is the default then they must consciously change the it to an insecure setting.

 

From a security perspective the current default setting is backwards.

Grant Hansen

It is very difficult moving from Alteryx functions to SQL In-Database as a business user, I need to learn a whole new language.

 

In the short term Alteryx should provide a simple function reference, as similar as possible to the Formula tool, for building formula in the in-database tools.

 

Longer term I'd like there to be a parser from Alteryx Formulae to SQL so I can just write in my favourite Alteryx formula (or a subset thereof) and Alteryx handles the conversion to SQL. 

Preface: I have only used the in-DB tools with Teradata so I am unsure if this applies to other supported databases.

 

When building a fairly sophisticated workflow using in-DB tools, sometimes the workflow may fail due to the underlying queries running up against CPU / Memory limits. This is most common when doing several joins back to back as Alteryx sends this as one big query with various nested sub queries. When working with datasets in the hundereds of millions and billions of records, this can be extremely taxing for the DB to run as one huge query. (It is possible to get arround this by using in-DB write out to a temporary table as an intermediate step in the workflow)

 

When a routine does hit a in-DB resource limit and the DB kills the query, it causes Alteryx to immediately fail the workflow run. Any "temporary" tables Alteryx creates are in reality perm tables that Alteryx usually just drops at the end of a successful run. If the run does not end successfully due to hitting a resource limit, these "Temporary" (perm) tables are not dropped. I only noticed this after building out a workflow and running up against a few resource limits, I then started getting database out of space errors. Upon looking into it, I found all the previously created "temporary" tables were still there and taking up many TBs of space.

 

My proposed solution is for Alteryx's in-DB tools to drop any "temporary" tables it has created when a run ends - regardless of if the entire module finished successfully. 

 

 

Thanks,

Ryan

I think it would be extremely helpful to have an in-DB Detour so that you could filter a user's information without having to pull it out of DB and then put it back in for more processing.  A time where this would be useful is if you have a large dataset and don't want to pull the entire dataset out of the DB because it will take a long time to pull it.  This would be applicable for filtering a large dataset by a specific state chosen by the user or possibly a region.  The Detour in the developer tools actually seems like it would do the job necessary, it just needs to connect to the In-DB tools.  

Top Starred Authors