Free Trial

Alteryx Server Discussions

Find answers, ask questions, and share expertise about Alteryx Server.
SOLVED

Connecting to Google Big Query for Alteryx Server

caltang
17 - Castor
17 - Castor

Another loaded question on the Server piece. 

The current Google Big Query (GBQ) Input / Output has been deprecated. There are two ways to connect now: 

1. Input Tool

2. In-DB connection

 

If we are to use the Input Tool, assuming that the GBQ connection is in place via ODBC, I have a few questions pertaining to data storage. 

Number 1: 

A connection must be established to GBQ to call the relevant data in with SQL. Now, whatever that is called in with SQL - is that data then stored on the Server or is it streamed to the Server? Meaning to say that the input will not be stored. 

Is that the case? 

Number 2:

I plan to make an interface on top of the input tool for end users to choose SQL scripts pre-written by me so that they can self serve without the need for SQL knowledge. 

Will I need to embed the connection on the Server for it to work? Will any inputs from Interface tools on the server be stored on MongoDB as well? 

If yes, will the 1-day purge cleanse all the data stored? 

Number 3:

Will it be easier then to use the In-DB tool which streams the data into the workflow? What are the pros of doing so vs an Input Tool calling from the GBQ DB? 

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
5 REPLIES 5
jNunez
8 - Asteroid

Question 1:

Every time the workflow runs, each instance of the query will execute and download the data into memory for whatever computer (be it a computer running designer or server). If you do not have an output data tool, it will not be saved anywhere, nor will it be cached. So in your phrasing, I guess that means it's streamed.

Question 2:

There are a few ways to setup the connection for users. You can set up a gallery connection for the data source. If you can RDP into the server running the gallery, then you can setup a datasource for the machine. In the latter case, this connection profile will not be saved to the workflows in an easily reproducible manner. You'll have to create a windows batch script to easily share the DSN with users in your organization. Also, they'll need any GBQ drivers before running it. The gallery connection is the easier to deploy.

Question 3:

In-DB tools only stream to your computer's memory once an in IN-DB stream tool is used. Depending on your query sizes, it is usually better to do any joins filtering on db side (IN-DB) before downloading it into your computer.

caltang
17 - Castor
17 - Castor

Thank you @jNunez !

 

I have some follow-ups from that.

 

Re: Question 1

 

I do plan to have an output tool for the end user to export their relevant data / insights. However, the tricky part right now is that we have an agreement internally to not store any data on our Server even for a moment in time. I am trying to find a way for my end users to even use the Server in that scenario.

 

Connected to Re: Question 1 is Re: Question 2:

 

Now, due to those current limitations, I was thinking of setting up the GBQ connection permanently on the Server, so that when end users interact with the Server on the browser, they do not need any login credentials whatsoever. All they need to do is select the SQL Query of their choice, and then click on the Transformation they want, then export out. I'm not sure if I still need to set up for each individual because that workflow will only be stored in a Collection with AD User Groups only accessing it. Not sure on your response, can you clarify further?

 

Re: Question 3

Yes, I am still scoping that out with the end users. I do not think it will be too large after the SQL queries, usually about 500k records or so. Rarely hits 3m records.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
caltang
17 - Castor
17 - Castor

@jNunez @jrgo on a separate note than my other posting, may I trouble the both of you on this as well?

 

 

cc @fmvizcaino 

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
jrgo
14 - Magnetar
  1. Only way to ensure that data is not persisted in Mongo is to have your output tools write directly out to some network folder where someone can retrieve the file from. If you design your output to make the file downloadable from the result page, that file will be stored in MongoDB.

  2. I'm not familiar with GBC or what type of connection strings it allows. If it allows you to embed the credentials needed for authentication within the Input tool itself then all you may need to do is configure an ODBC DSN on the server that has the other connection settings that are NOT stored in the input tool.

    Re: inputs for interface tools... depends on whether you packaged the input the interface tools needs to load the options when you published to gallery or not. If you did, then yes, that will persist in Mongo. If you configured to read from an external source (i.e. file in a network share or query a table) it won't persist everything. I believe that in any case, the value that is selected from an interface question is persisted in Mongo.

    From the Alteryx help docs on this data source, your best bet will be to leverage DCM (not Server/Gallery Data Connections) to manage your authentication creds between your local machine and your server. You can set who's allowed to authenticate with your DCM credential when they run an app you've shared with them on Server.

  3. Probably easier to use the normal Input tool. Otherwise, you'll have to worry about managing an In-DB connection setting as well.
caltang
17 - Castor
17 - Castor

Yes, agreed on Input Tools @jrgo 

 

If I connect using Input Tool like so with GBQ ODBC:

image.png

 

My output is writing back to the GBQ DB. No output for end users on Server.

 

I am confused on your response 2 and 3. Can it be simplified further? I am not following entirely.

 

Let me rephrase:

 

Question 1: If I use Input Tool and Output Tool both with the config above to read from DB and write to DB, is data stored on Server/MongoDB?

 

Question 2: If I use the credentials from my on-prem PC and upload it to Server, do I need to package all my assets also? Wouldn't this persist in MongoDB?

 

Overall Question: I just want to use the GBQ data and do my ETL on Alteryx Server for scheduling purposes and not have the fact data stored anywhere on the Server/MongoDB at all. Metadata is completely fine to be stored, or even actions/logs, but nothing sensitive.How can I do that?

 

Thank you @jrgo 

 

cc @jNunez @fmvizcaino 

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/