Another loaded question on the Server piece.
The current Google Big Query (GBQ) Input / Output has been deprecated. There are two ways to connect now:
1. Input Tool
2. In-DB connection
If we are to use the Input Tool, assuming that the GBQ connection is in place via ODBC, I have a few questions pertaining to data storage.
Number 1:
A connection must be established to GBQ to call the relevant data in with SQL. Now, whatever that is called in with SQL - is that data then stored on the Server or is it streamed to the Server? Meaning to say that the input will not be stored.
Is that the case?
Number 2:
I plan to make an interface on top of the input tool for end users to choose SQL scripts pre-written by me so that they can self serve without the need for SQL knowledge.
Will I need to embed the connection on the Server for it to work? Will any inputs from Interface tools on the server be stored on MongoDB as well?
If yes, will the 1-day purge cleanse all the data stored?
Number 3:
Will it be easier then to use the In-DB tool which streams the data into the workflow? What are the pros of doing so vs an Input Tool calling from the GBQ DB?
Solved! Go to Solution.
Question 1:
Every time the workflow runs, each instance of the query will execute and download the data into memory for whatever computer (be it a computer running designer or server). If you do not have an output data tool, it will not be saved anywhere, nor will it be cached. So in your phrasing, I guess that means it's streamed.
Question 2:
There are a few ways to setup the connection for users. You can set up a gallery connection for the data source. If you can RDP into the server running the gallery, then you can setup a datasource for the machine. In the latter case, this connection profile will not be saved to the workflows in an easily reproducible manner. You'll have to create a windows batch script to easily share the DSN with users in your organization. Also, they'll need any GBQ drivers before running it. The gallery connection is the easier to deploy.
Question 3:
In-DB tools only stream to your computer's memory once an in IN-DB stream tool is used. Depending on your query sizes, it is usually better to do any joins filtering on db side (IN-DB) before downloading it into your computer.
Thank you @jNunez !
I have some follow-ups from that.
Re: Question 1
I do plan to have an output tool for the end user to export their relevant data / insights. However, the tricky part right now is that we have an agreement internally to not store any data on our Server even for a moment in time. I am trying to find a way for my end users to even use the Server in that scenario.
Connected to Re: Question 1 is Re: Question 2:
Now, due to those current limitations, I was thinking of setting up the GBQ connection permanently on the Server, so that when end users interact with the Server on the browser, they do not need any login credentials whatsoever. All they need to do is select the SQL Query of their choice, and then click on the Transformation they want, then export out. I'm not sure if I still need to set up for each individual because that workflow will only be stored in a Collection with AD User Groups only accessing it. Not sure on your response, can you clarify further?
Re: Question 3
Yes, I am still scoping that out with the end users. I do not think it will be too large after the SQL queries, usually about 500k records or so. Rarely hits 3m records.
@jNunez @jrgo on a separate note than my other posting, may I trouble the both of you on this as well?
cc @fmvizcaino
Yes, agreed on Input Tools @jrgo
If I connect using Input Tool like so with GBQ ODBC:
My output is writing back to the GBQ DB. No output for end users on Server.
I am confused on your response 2 and 3. Can it be simplified further? I am not following entirely.
Let me rephrase:
Question 1: If I use Input Tool and Output Tool both with the config above to read from DB and write to DB, is data stored on Server/MongoDB?
Question 2: If I use the credentials from my on-prem PC and upload it to Server, do I need to package all my assets also? Wouldn't this persist in MongoDB?
Overall Question: I just want to use the GBQ data and do my ETL on Alteryx Server for scheduling purposes and not have the fact data stored anywhere on the Server/MongoDB at all. Metadata is completely fine to be stored, or even actions/logs, but nothing sensitive.How can I do that?
Thank you @jrgo