community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Knowledge Base

Definitive answers from Designer experts.

Getting Data to Big Query

Alteryx
Alteryx
Created on

NOTE: This article was written by Paul Houghton and was published originally here All credit goes to Paul Houghton and his group at the Information Lab.  Thought this was such a great solution; wanted to make sure it was made available on the Alteryx Community as well.

 

The Problem

I have recently been working with a client who wanted to get their data into a Google Big Query table with Alteryx. It sounded like a simple problem and should have been pretty easy for me to get working. 

 

Unfortunately that was not to be the case. Connecting to a big query table with the Simba ODBC driver is pretty easy, but that driver is just read only. Dead end there.

 

So off to the Big Query API documentation.

 

I found that uploading a single record was relatively easy but slow, uploading thousands of records should be easier. 

 

Deeper digging I went. Using the web interface I can upload a csv to Google Cloud Storage, then load that file into Big Query, the challenge I had was to automate the process.

 

Step 1 - Throw it into the (Google) Cloud

So I was quite lucky that my colleagues Johanthan MacDonald and Craig Bloodworth had already made a functioning cloud uploader using the cURL program. All I had to do was doctor the URL to work for me. Ideally I wanted to make the entire process native to Alteryx. 

 

… (T)he next phase of the automation, (is) importing the csv into big query.

 

Step 2 - Big Query your Data

So the second step was to get the data into big query. I found this a bit of a humdinger and it really stretched my API foo. So what did I have to do?

 

What query do I run?

I struggled with this for a while until I attended the Tableau User Group held at the Google Town Hall in London, I was able to get in touch with Reza Rokni who pointed me at the example builder at the bottom of the Jobs List Page where you could build an example of the query to send.

 

The biggest problem with this is getting the JSON file with the table schema right. I got it eventually, but there were a fair few challenges.

 

Step 3 - Get it to the Cloud with Alteryx

Once I had the whole process working I wanted to go back to the problem of uploading a file with Alteryx. I knew it was surely possible to upload a file with the download tool. I just couldn't work out how to get the file in a row to upload.

 

 BigQuery.jpg

What settings do I use?

 

Enter the Blob Family

What made this the upload work was finding what the blob tools do and how they work. So what is a blob? It's a Binary Large Object, basically a file without the extension.

 

 BigQuery2.png

Blob Family

 

So now using the blob input tool I'm able to read in a file to a single field. This is exactly what I needed. It’s relatively simple from here to read the file into a single cell and use the normal download tool for the upload process.

 

The Final Furlong

Now to pull this all together the last step is to put in the details needed to do the upload. So what are the settings? 

 

Well on the 'Basic' we simply want to set the target URL (check the API documentation to work out what that would be), and what we want to do with the response from the server.

 

On the headers page we need to define the 'Authorization' parameter and the 'Content Type' parameter that is needed by Google's API.

 

The last set of configuration is the payload, and that is where we define the use of a POST command and what column contains the Blob.

 

Success!

And after all that I have managed to have success. By combining these two processes the records are uploaded as csv, then imported into a pre-defined Big Query table.

 

Paul has uploaded this to his Alteryx Gallery Page (v10.5) and would love to get feedback from anyone who uses it in its current configuration. He would appreciate any feedback on how it works as it is and how easy it is to configure. He would also love to get some ideas on how to improve the tool. He already have ideas to automate the table schema file creation, develop separate up-loaders for Big Query and Google Cloud, see if changing the upload query will make it more robust, possibly build in compression, and who knows what else would be useful.  Please post your comments here.

 

Comments

Interesting. Such a comprehensive article. Thanks for providing all the information.