Dev Space

bryanbumgardner · ‎04-11-2018

Bringing this to the Developer Community because it has stumped everyone else.

I am attempting to create Gallery applications on a Microsoft Azure virtual machine that's hosting our Gallery. These Gallery applications will contain In-DB tools that connect to an Amazon Redshift database and operate some in-db queries. This entire stack will give my team fast, dynamic queries of a massive 5 TB database.

On small queries, the apps work fine. However, on large queries, the Apps seize up and the Redshift cancels the queries around 20 minutes. I have identified a threshold - there are some queries that will run successfully, taking up to an hour and fifteen minutes to complete. However, any query above this threshold in potential data size will automatically cancel after about 20 minutes of operation, regardless of size above the threshold. Our sortkey is a date column, and the threshold I've identified is basically more than a month of data.

Details about these apps:

- DB Connections are done via system-wide connection string set up on the Gallery ODBC Admininstrator.

- There are no query limits established on the Redshift, and it has enough working memory to complete queries above the threshold. I have successfully completed above-threshold queries from other machines, it only fails when done on the Gallery or the Designer installed on the same Azure.

- There are no clear limits on the data bandwidth allowed via Azure, but my experience with Azure is limited.

- The error code reads that "the server terminated the connection."

- The apps run about 8-10 In-Database tools including Summarizes.

Research on other forums:

- the AWS forum is not helpful at all, and StackOverflow knows nothing about Alteryx.

- A similar post was created in the general discussion, and I thought it was solved. However, increasing the size of the database did not actually solve the issue.

Below is a screengrab of the In-DB tools where the workflows get stuck. This combination of tools works wonderfully on my local computer.

These apps failing is making me look really bad at my job. I promised these apps weeks ago and they are all operational locally but fail on the Gallery. If operational, they would revolutionize how our account pulls data. Please, if there is anyone at Alteryx who can help with this, I would love to start an extended conversation beyond the forum to solve this issue.

Thank you.

bryanbumgardner · ‎04-11-2018

A followup: the advice I received from my local team is that I should create materialized tables - however, if these queries are failing because of size there's no reason they wouldn't also fail on materialized tables after a certain threshold.

I would rather completely fix the problem than simply apply a band-aid.

MichaelCh · ‎04-11-2018

Have you tried using temporary tables?

Has @LindaT looked at your issue when you posted it elsewhere on community?

TashaA · ‎04-12-2018

Adding some of our Gallery/Server experts

@MattH @MattB @JulieM @KevinP

RyanSw · ‎04-12-2018

You might take a look at this:

https://discuss.pivotal.io/hc/en-us/articles/115005583008-Azure-Networking-Connection-Idle-for-more-...

> Cause

> After lots of investigation with involvements from SCS, Ecosystem, Diego, BOSH, Garden, and Microsoft Azure team members, the underlying issue is that any resource with a public IP endpoint on Azure, such as an Azure Load Balancer (ALB), has a default idle connection timeout of 4 minutes. When Azure detects that a connection has been idle for more than 4 minutes, it closes the connection without sending a TCP RESET to inform the client side that the connection has been closed.

That article lists a solution for azure network drops involving increasing idle timeouts on network portions of the azure system, might be worth a quick look.

bryanbumgardner · ‎04-17-2018

Hey Ryan,

Thanks for this, I will apply and see if it works. I'll be back with outcomes. Cheers!

kothapallihari · ‎04-21-2018

Hi Bryan,

Did you find a resolution?

Thanks

Hari

bryanbumgardner · ‎04-25-2018

Hey everyone,

After a lot of back and forth with IT, the admin says he "changed the registry" to solve the issue. He changed the following values:

KeepAliveInterval = 30

KeepAliveTime = 120

TcpMaxDataRetransmissions = 8

He declined to change the load balancer issue, saying that increasing the timeout window is not good practice.

The changes he made above did not solve the issue. I will return with more updates later.

Coxta45 · ‎04-25-2018

Why not try adjusting the ALB setting, per @RyanSw's suggestion (even if only temporarily) to see if solves the problem? Especially given how critical this seems..

Hope you can get this resolved and curious about your overall Azure impression - our company keeps wanting to "dip their toes" in the Azure waters to save money...but I'm not sold, and doubt I ever will be after using Google's Cloud Platform - which I find to be awesome. I mean, Microsoft can't even demo their cloud platform technologies using their own browser..

Good luck! Wish I had some actual advice for you..

Dev Space

Gallery/Azure/Redshift stack: failing on large queries