community
cancel
Showing results for 
Search instead for 
Did you mean: 

Engine Works Blog

Under the hood of Alteryx: tips, tricks and how-to's.
Alteryx
Alteryx

Alteryx Server has increasingly become more popular as analytics leaders look to scale-out Alteryx to tackle bigger projects, larger datasets and to put self-service data analytics into the hands of more decision makers. Organizations ranging from small retailers and niche data providers, to large corporations such as Southwest Airlines, Chic-fil-A and Western Union have upgraded to Alteryx Server to improve analyst productivity and improve decision making.

 

As a line-of-business analyst, you probably know that you need to make your analytics practice enterprise ready, and maybe you’ve begun to look at Alteryx Server and started to think about next steps. At this point, we start getting more questions about best practices for deploying Alteryx Server, how it scales, governance questions and much more. So I recently sat down with our leading Alteryx Server experts, Kory Cunningham, Senior Product Manager for Alteryx Server and Gary Schwartz and Steve Ahlgren who are Development Leads for Alteryx Server, to get the answers to some of our most frequently asked questions (FAQ). Here’s the feedback we got from the team.

 

Thanks for reading,

Josh Howard, Sr. Product Marketing Manager

 

What are some of the key Alteryx Server features that someone might be interested in?

 

KoryC.pngKORY CUNNINGHAM - There are quite a few things, but one of the most compelling features is just the ability to leverage server hardware so you can deploy your analytics at scale and support concurrent users. We’ve seen Alteryx Server deployed at small, five person boutique analytics shops to multi-national companies with hundreds of users. The second is around scheduling. The Scheduler gives you the ability to take workflows and analytical processes and schedule them to run in the future for more timely reporting, and manage it in an automated fashion for daily, weekly, monthly reporting. The third is the Gallery. The Gallery is a web interface where you can take Alteryx workflows and publish them up to a gallery to share with other colleagues and business users to get that same analytical processing but without having Alteryx installed on their desktop. For example, you could have someone out in the field log on to a web browser, access the gallery and run their reporting without Alteryx Designer. Within the Gallery platform you also have collaboration and version control on workflows. You can share, modify, and keep track of those changes by user and when the change took place. This is all part of the Alteryx Server platform.

 

Tell me about the deployment configurations supported in Alteryx Server. Can it be deployed in a virtual environment?

 

GaryS.pngGARY SCHWARTZ – Regarding the deployment configuration, we are on a Microsoft stack, so it is all Windows. We support Windows Server 2008 R2 and later, and you can see the recommended technical specifications on our TechSpecs page.  Regarding the deployment, we certainly recommend that you run the Server on bare metal, but you can run the Server in virtualized environments too. In fact, we run our own Gallery in a virtualized environment on Amazon. We do test on some virtual environments, but we don’t technical certify the Server on those environments. So the Alteryx Server performs well on both physical and virtual environments. The thing you need to think about is the resource contention on the physical box. If it is running on a physical box that is shared by many, then you can get some resource contention. You also have to consider CPUs vs. vCPUs. For example, in Amazon you might have 4, 8, or 16 CPUs, but what they are really referring to are vCPUs, which is half of a real CPU, and our specs are based off physical CPUs. So that’s something you have to keep in mind.

 

What types of authentication are provided? Is SSO available?

 

KoryC.pngKORY CUNNINGHAM - As far as the Gallery, it does support two main forms of authentication. One is built in authentication, which is just email and password that users can sign in on by using their own account that they or the administrator creates. But we also support Windows authentication, including native support for NTLM and Kerberos. This allows users of the Gallery to log on using the same credentials that they likely use to log on to their own machine, and leverage your existing active directory for authentication and user management to access the gallery. And with 10.5, we have also introduced the ability to run workflows as the user, so a user can run a workflow using their own credentials rather than using a global server admin account and only get access to the data that they’ve been permissioned for.

 

How many users or apps can a deployment handle?

 

SteveA.pngSTEVE AHLGREN – The short answer is that we can handle an unlimited number of users with some upper bound, and we have clients with hundreds of users on the Alteryx Server. But, rather than the number of users, perhaps it’s more important to consider the number and types of apps that can be handled. The good thing about the Server is that it scales linearly, so you can throw more resources at it to handle an increased load. If you have hardware or virtual hardware available, we can fill that instance to its capacity depending on what your expected load is. The number of concurrent users is limited by your front end services provided by your internal or cloud infrastructure. If you have a load balancer, we can scale to handle both front end and back end loads.

 

How does the Alteryx Server Scale?

 

SteveA.pngSTEVE AHLGREN – There are three main points of scalability and I’ll list them in order of likelihood:

  • Adding workflow processing capabilities to a server instance. We call these Queue Workers or Render Workers, which are instances of physical or virtual hardware that process Alteryx workflows and render map tiles, respectively. The Queue Workers run the same Alteryx Engine that runs underneath the Alteryx Designer product. There are two main ways to scale these, one is scaling-out horizontally, which is adding more physical or virtual workers to a server instance and the second way to scale is scaling-up vertically, which is adding more worker capacity to an existing physical node or adding more hardware to an existing physical node. We emphasize that most performance gains will be made by scaling-out horizontally, adding more physical hardware and physical workers to an existing server instance.
  • The second way to scale is by scaling the backend database, which in our case is MongoDB. MongoDB has several kinds of scalability options for redundancy and performance, and one of those is through Replica Sets. Replica sets is a way Mongo achieves data reliability and redundancy by essentially writing data to multiple nodes at the same time to ensure there is at least one copy of your data in existence at all times. If a node drops, then a new node will take its place and the server should be able to stay up and running. You can also scale the database through sharding. Sharding is more performant and it potentially enables the use of smaller disk sizes which is important if you are using virtualized hardware in an Amazon type infrastructure
  • The third way to scale is through the Gallery. The Gallery is the front end service that handles user requests from the client or web browser. This is the least likely way to scale because it doesn’t tend to be a bottleneck, and it depends on the type of users and apps that are running, but it is possible to scale using a load balancer. We actually have used all three of these scaling methods deployed on our own instance of the Alteryx Public Gallery.

 

What type of hardware or network specifications are required?

 

KoryC.pngKORY CUNNINGHAM – You can find all of our technical documentation from http://downloads.alteryx.com/ , but for a typical installation, we recommend a quad core, single CPU machine with 2.5GHz and at least 16GB of RAM (or more) and a solid state drive (SSD). The SSD actually makes a pretty big difference with how fast the data can read or write. The other thing to consider is the network topology as it relates to the dataset location. For example, if you are purchasing our 3rd party data (e.g. Experian) it can be installed on both the same machine as the Server software and in a network location. But installing it on the Server will be your best option for performance and having it as close to the execution engine or compute space as possible. But we do allow the data to be stored on a network which allows the data to be stored in a remote location which is convenient for having multiple machines remoting-in to access that data and updating it, but it does come at a significant strain on performance and you will likely see degradation. As far as network hops or links that need to happen in a multi-node deployment, you want to ensure those are kept to a minimum. So try to keep the machines as close together as possible to avoid latency.

 

GaryS.pngGARY SCHWARTZ – Another thing to think about with regards to the network and the  server that the Gallery is running on, is to configure an appropriate domain name that the users can access and pointing to the right server. We’ve seen a lot of examples where the DNS was incorrect and they couldn’t figure out why they couldn’t reach their gallery. Additionally with the Gallery, we do recommend that you use SSL or TLS, so that means you need to install a certificate on the server that is hosting the Gallery unless you are working behind a load balancer. Work with your IT team to get the SSL certificate installed, which can be IT intensive. The last thing is, the domain that the Gallery is running on needs the appropriate level of trust policy set with the other domains that the other users will be working on so that Active Directory can resolve and determine permissions based on that.

 

Are there any third-party web server or database requirements?

 

KoryC.pngKORY CUNNINGHAM – The Server has packaged up all of the necessary components (e.g. web server and database) in the architecture or the installation itself. So everything can be installed and configured from the Server installation. For a single node machine, there are no additional dependencies, but you might start seeing some 3rd party requirements with a multi-node environment if you scale-out. If you scale out the Gallery for example, you will need to set up a load balancer to handle all of the web traffic – at Alteryx we use Amazon’s elastic load balancer, but you can use something like F5 Networks or whatever your IT department uses. If you scale out the Database, as mentioned above, we do package up an embedded version of MongoDB. But if you start taking advantage of Mongo’s replica sets for high availability, redundancy and reliability, then you would need to manage your own Mongo database node. And we recommend you take a look at Mongo’s specs, installation and best practices to set that up. But other than that, there are no 3rd party requirements.

 

GaryS.pngGARY SCHWARTZ – I would also mention that we do get a lot of questions about the Web Server. While we do package up the web server and database in the architecture, the Gallery is a self-hosted web service. It’s not using anything like Tomcat or Apache, it is a self-hosted web service that registers with the base address configuration that listens for web request from that location.

 

 

 

How can I manage high-availability and data backups?

 

GaryS.pngGARY SCHWARTZ – For high availability it is about redundancy in the architecture, and since the architecture is designed to scale horizontally the platform supports it. From the Gallery standpoint, you could scale to have multiple Gallery nodes behind a load balancer. If one node fails, then it will still take traffic while you recover the other one and get it back up and running. Second, back to what Steve said about gaining additional throughput of the workflows by scaling out your worker nodes…The more worker nodes you have, the more availability you have to run workflows. So you want to monitor the usage of your system. If your worker nodes are continually running at full capacity, then you should add more worker nodes for more capacity. We certainly recommend that you do back up your data. The first step in doing that is scaling out your embedded Mongo database to stand up multiple Mongo replica sets on separate nodes which will give you immediate backup and redundancy. We also recommend you follow Mongo’s recommendation for backups. There are several strategies for this.  We use regular intervals of incremental backups so we have a base backup with incremental snapshots along the way without having to do full backups all the time. Then we occasionally do full backups.

 

KoryC.pngKORY CUNNINGHAM – We do have a backup and restore facility availability for basic backup capabilities in the embedded MongoDB instance. But if you are looking for a high availability solution that is highly redundant, then we do recommend you go down the path of a user-managed multi-node deployment of a MongoDB deployment.

 

 

SteveA.pngSTEVE AHLGREN – I would also add that if you are running in AWS, I would set up regional distribution. Amazon does have outages. Even in our own Public Gallery we’ve seen outages and having that regional distribution ensures that we have near 100% uptime. In terms of backups, setting up EBS snapshots is easy if you are running in EC2. You can set up those EBS snapshots and full backups every week or every couple of days.

 

 

 

How do you monitor a server deployment?

 

SteveA.pngSTEVE AHLGREN – There are several different ways to monitor deployments. Every component in the server has its own logging infrastructure. The front-end Gallery writes its own logs, the back-end also has its own structure using syslog levels, and the Mongo database has its own logging structure.  So there is extensive logging in every tier of the server architecture. We suggest everyone collect these logs and use Alteryx. Alteryx uses Alteryx to collect our own logs. We also siphon the logs off into Amazon S3 where they can be imported into 3rd party tools like Logstash and Loggly but Alteryx is probably all of the logging tool you’ll need for this. In terms of other monitoring capabilities, we also recommend using services like Amazon CloudWatch to monitor system health like CPU and memory, which will give you early indication of issues or problems. We also recommend backend database monitoring with MongoDB Cloud Manager for monitoring things like slow queries and accessing data with extensive paging, both of which could indicate an issue. Another 3rd party monitoring tool we use is New Relic, which we use to monitor the front-end nodes in a multi-tiered server architecture. We use both New Relic in a combination with Amazon CloudWatch to monitor machine health for things like key gallery processes, heavy CPU load or network traffic. But we also package up a Server Usage Reporting and Monitoring app that run within Alteryx on the Server to monitor things like the number of current users, the average number of jobs waiting in the execution queue, the length of time it takes jobs to run – so basic monitoring that would be useful in managing the Server.

 

What user permission and data access controls are available?

 

KoryC.pngKORY CUNNINGHAM - Inside the gallery you can manage the users and provision those users with various levels of data access for different user roles. From your basic Viewer, which is a base level view only access to the Gallery where they can just run workflows, to what we call the Artisan, which are those users who are contributing content and uploading workflows and creating apps in the Gallery to share with other users, and finally to what we call the Curator role, which is that administration and management role that manages the Gallery. For those Artisans that are creating content, workflows are stored in what we call “Studios”. The studios are basically restricted project folders where multiple analysts can share and collaborate and publish workflows that only they can see. From there, they can share those workflows to other Studios or with other users, further providing a way to control the data access management.  With Alteryx Server, we respect the rules and permission levels that you set at the database level. So we give you the ability to set global permissions at which all workflows run as. But in the 10.5 release, we provided further flexibility giving you the ability to set permissions at the individual level that workflows run as well. We are continuing to expand our capabilities here, so stay tuned for future releases! (Editors Note: Check out Kory’s video on Respecting Data Governance with Self-Service Data Analytics for more information)

 

THAT’S IT FOR NOW. THANKS!

 

Big thanks to Kory, Gary and Steve for sitting down with me and answering some of the questions we get on Alteryx Server. If you have more questions, feel free to ask us here on the community or download the report, "Alteryx Server: Scaling Self-Service Data Analytics for the Enterprise".

 

Josh Howard

 

 

 

 

 

 

 

 

 

 

 

Josh Howard
Sr. Director, Product Management

Josh Howard is a technology product veteran covering trends in information management and based in the Denver/Boulder, CO area. He has more than 20 years of experience in developing product and go-to-market strategies across a wide variety of data technologies including business intelligence, analytics, data integration, and database development. He is currently the Sr. Director of Product Management at Alteryx. You can follow Josh on Twitter at @Joshoward

Josh Howard is a technology product veteran covering trends in information management and based in the Denver/Boulder, CO area. He has more than 20 years of experience in developing product and go-to-market strategies across a wide variety of data technologies including business intelligence, analytics, data integration, and database development. He is currently the Sr. Director of Product Management at Alteryx. You can follow Josh on Twitter at @Joshoward

Comments
Creative Director
Creative Director

Great post, guys! Remember to tune in to the hangout tomorrow to learn more about Alteryx Server!

http://community.alteryx.com/t5/Alteryx-Hangout/Alteryx-Hangout-The-Server-Sessions/td-p/21270

Alteryx Partner

Fantastic resource ... I am banking it for definite use later.

Alteryx Partner

I presume Server stores;

 

  1. workflow xml
  2. maybe input data for the run and
  3. results of the completed run

Can we store data gathered from multiple internal or external sources in order for gallery users to connect to the server and either download to desktop or utilize on the server's MongoDB?

 

For e.g

  • we have Point of Interest, ESRI and similar spatial data that I'd like to shere thru a common Alteryx server
  • similarly I'd like to download sample CRM tables and anonymize them, let a set of business users to have a sandbox to play with anaonymized data..

Can we do this on Alteryx Server MongoDB?

 

Best 

Alteryx Alumni (Retired)

Hey @Atabarezz - you're correct on the items that are stored in MongoDB, amongst other things. As far as leveraging that database to store and make available additional datasets or content, we don't support anything like that today. To deploy shared resources or datasets for all of your users, you have a few options, everything from the most simple option of storing the data on the server and accessing it via a shared network location, to spinning up S3, Redshift or other database instances to house the data and make connections available there. In the option of housing the data in a given datasource, you can also create a macro that can act as the input tool for that particular dataset, and distribute that macro to your users. This will allow them to access the datasets that you want them to access, and to do so in a sort of controlled environment, with you providing the exact functionality for how the end users will ultimately consume the data.

Alteryx Partner

 

Quick questions;

 

1) If there is a community edition MongoDB on Alteryx Server

 

  • Can we upload tables/data/documents using an Alteryx Desginer by uitilising Output data HDFS connection?
  • Can we connect to that data using input data HDFS connection?

2) If we happen to install our own MongoDB (user-managed option) and work Server on top of it, then can we inout and outpu data form the server?

 

 

Alteryx Partner

Our intention is this;

Because Alteryx is easily scalible with very little help from IT, then a few clients started asking "can we create a sandbox for corporate Innovation, where we can dump some big data, which on the production environment we cannot utilize, but we definately wanat to test some of our ideas/hypotheses"

If we can do that Alteryx not only becomes a regulat analytşics product but a nice test bed for birght new ideas...

 

Examplary data sets wanted to be analyzed include

  • Historical POS and credit card transactions data for the lsat year for a large retial bank
  • Itenerary data of a global scale airliner
  • CDR data for a Mobile Telco with a subscriber number of 70 mio something

Best

Alteryx Partner

Thank you guys for the extensive and really helpful explanations.

 

Steve, you said ' The short answer is that we can handle an unlimited number of users with some upper bound...', I wanted  to point out that unless the  upper bound is empty, this statement is a contradiction. 

Meteor

Hi sorry to post this message here, but alteryx server is not an enterprise-ready product.  I work on multiple BI tools as administrator but this is my personal opinion that this alteryx server needs a lot of things that an enterprise looks.

 

Here are the Pro's and Con's

 

Cons  / which would be really nice to have 

 

  • No proper data source connections for all types of sources (just Oracle and SQL rest is odbc) ,
  • No LDAP Connectivity/ no SSO, or SAML .
  • It really doesn’t have a basic function of detailed logs for workflow failure
  • No enterprise standard security (cannot delete users , cannot create groups, cannot assign groups of users to a collection ).
  • No proper documentation for workflow or app failure error codes .
  • Cannot have a user in 2 studios .Cannot secure public gallery with basic Authentication.
  • Unable to find workflows tool run status and pass to other tools or workflows , run workflows in series/parallel (based independency).
  • Have to mimic whatever was done in desktop again on the server.
  • Security is Hard to Maintain and no users groups.
  • Cannot migrate workflows from QA to prod straightforward.

 

Pro's

I agree that desktop is perfect but when it comes to the server there are lot of issues that concern us.

Awesome Analytics Tool for desktop users.

 

I think that alteryx takes this in a good way and fix these as needed . 

 

-Ranjith

 

Alteryx Partner

Important points Ranjith;

 

Quick comment, there should have been some logs regarding "detailed logs for workflow failure". Have you checked;

 

  • C:\ProgramData\Alteryx\Service\AlteryxServiceLog.log
  • C:\ProgramData\Alteryx\Gallery\Logs\alteryx-YYYY-MM-DD.csv includes errors in Analytic App
  • Alteryx_Log_[number].log and C:\ProgramData\Alteryx\ErrorLogs...
Meteor
Yes I did , I am aware of all logs ,I’m server Admin. I struggled for 3
days for an issue that was password wrong for an gallery alias sql
connection.which was not logged in server logs .even alteryx support was
not able to detect that immediately from logs .
--
Sent from Gmail Mobile
Alteryx Partner

So you have find the nice hot spots, good job...

 

May I suggest splitting the issues in MECE "Mutually Exclusive, Collectively Exhaustive" and add as ideas?

Then we can support these great feture addition ideas...

 

Best

Alteryx Partner

If you have added any of these aspects as ideas I would love to star them and give you support... best

Alteryx Alumni (Retired)

Hi @reddy520 - thank you for posting all of your feedback around Server.  We have a lot of plans for this space over the course of the next year to make many of the improvements you've listed above.  That being said, I'd love the opportunity to talk with you directly to dig into some of these things a bit more so we can use this feedback to guide the efforts we have on the road map for 2018.  Would it be alright for me to contact you directly to discuss?  Thank you again for the feedback!  

Hi @JulieM,

 

Would you be able to or can you share a roadmap for Alteryx Server? It has been 4 months since you posted so maybe some of what you mentioned has been completed? We currently only use Alteryx for Desktop, however, we have a large initiative on my team that we were hoping to use Server for. That is what led me to this thread but the cons that @Ranjith520 are very alarming and would steer me away from looking at Alteryx Server as our solution. We are a big Tableau shop and looking to expand on the Alteryx side to gain efficiencies.

 

Meteor
@ryanschlig, Alteryx has a roadmap that they showed us (to our client by Ashley Cramer @alteryx ) which will fix most of our issues . You should still pursue moving to server as that would be really help which benefits your org . We are large tableau shop too but alteryx server makes lot of things easier like publishing as tde or hyper to server which eliminates lot of issues or manual work . We could not now survive with out it now . Please don’t take my suggestion to alteryx as drawback overall . Instead go for server ,it will only benefit when u move to server . I have posted them as ideas on alteryx ideas forum and we got promised that they will release it soon . Thanks Ranjith
Alteryx Partner

Any updates on the issue you mentioned Ranjith?

Meteoroid

My organization is scoping out Alteryx Server, any further update if the cons previously listed by Ranjith have been addressed?

Meteor

Status update?

Labels