Engine Works

BenMoss · ‎09-03-2019

Let’s put this brutally: our personal data is hugely valuable to companies, and heck, even governments are now using our data to win elections.

Though it’s disturbing, I also find it oddly fascinating (probably because I have a limited social media presence), how companies use our data for things like targeted advertising and cross-selling of their products.

I’m a huge fan of recommender algorithms too; my ‘discover weekly’ on Spotify is near on perfect Monday afternoon listening, whilst Netflix hardly ever recommends me something I’d class as trash.

Let’s do a technical dive into how we can use Alteryx to connect to and make use of, the Twitter Developer API.

In the use-case I will go through, I want to understand the ‘trends’ that are cropping up in towns across the UK.

From a business perspective there are a lot of insights I could potentially glean from this information, here are a few examples:

Does my business name, or any of our companies’ products feature in trends?
Are there different towns across the UK that always seem to have similar trending topics?
What are the most talked about food ingredients/flavourings?

In order to understand what type of data you can get from the Twitter API, the ‘API documentation’ is always a good start.

Developer Sign-up

In order to connect to the Twitter API, we must first (this is a one-off step) apply for a developer account. Once this has been approved, create an application--this process can be done by signing up here.

Authentication

Most APIs nowadays will have some method of authenticating developers for a couple of reasons. Firstly, monitoring; they don’t want you putting so much stress on their service that you cause the whole thing to crash, so they will monitor your use and often throttle your responses in order to prevent this from happening.

Secondly, privacy; APIs can contain very sensitive information, and it’s important only the right people have access to the right information.

The Twitter API has a series of methodologies for authenticating against their API, depending on the tasks you are trying to perform.

For this use-case we need to use ‘Application-only’ authentication, which allows us to perform requests which do not include any user content. The process for this authentication is relatively straight forward and requires making a single ‘POST’ request to the Twitter API, which includes your application credentials, which then returns a ‘token’ that can be used to make subsequent requests for actual data.

Even though this is a simple process, there is no need to reinvent the wheel. Alteryx has already built a macro that performs the authentication process for us and at present this tool can be found in the following directory:

C:\Program Files\Alteryx\bin\RuntimeData\Macros\Social Media\Supporting_Macros

The macro we want to use is the ‘twitter_oauth_macro’ and it’s probably worth copying this into your macro directory so you have easy access to it from within the Designer platform.

It is also worth noting that this macro forms part of a now deprecated tool, which is why it’s worth copying and storing in your own macro directory, as it could be removed from the product at any point.

Let’s take this macro and bring it onto our blank Alteryx canvas.

By looking at the configuration pane for this tool (and by reading the authentication documentation), you will see that we need three key pieces of information. We need our ‘Consumer Key’, our ‘Consumer Secret’, and our ‘Application Name’; this information can be found on the Twitter Developer website.

I’m going to place this detail into a text input, and connect it to the Twitter Authentication macro, configuring the drop downs appropriately.

Get Trends

Once we have authenticated, we can start going wild and get our data from an appropriate ‘endpoint’ (I like to say endpoints are like tables of data, which we can then query information from through further filtering).

In the API documentation I can find two endpoints related to trend data.

Each of these serves a different purpose and provides different information.

The endpoint, ‘Get trends near a location’ sounds quite like what I need, but a bit of reading tells me that in order to ‘Get trends near a location’, I must first have the WOEID for the location I want to query. A ‘WOEID’ or a ‘Yahoo! Where on Earth ID’, is a unique identifier for different locations globally that I could probably find online somewhere. However, before I start the search, I take a look at the second endpoint under the ‘Trends’ header.

The ‘Get locations with trending topics’ endpoint allows me to get a complete list of locations with trending topics information. The ‘with trending topics information’ is actually quite important here, as it means I don’t waste time requesting against locations that do not have this information.

A bit of further reading on this endpoint shows me that this second endpoint returns me with the WOEIDs for active locations, so no static table required!

The best route to go then seems to be:

Use the ‘Get locations with trending topics’ endpoint to gather me a list of all WOEIDs with active trend data
Take this list of WOEIDs and then query the ‘Get trends near a location’ endpoint to get our trend information, with the WOEID forming part of the query string
Parse the response information into a structured table!

So, let’s start with step 1.

In order to make the request for this data, we need two things; we need the URL for the endpoint (commonly referred to as the ‘Resource URL’ in the Twitter API documentation) and we need our bearer token (which is one of the response values from the Twitter Authentication macro and forms our ‘Authorization’ header for our call).

To create this information, I have added a formula tool to our workflow and created two fields, one titled ‘Resource URL’ and the second titled ‘Authorization’.

For the ‘Authorization’ header to be in the correct format, we must prefix the value with the term ‘Bearer‘ which you can see I have done in the first formula.

In the second formula, I have simply defined the resource URL as in the API documentation.

Now it’s the Download tool’s turn to go and get this information. The configuration is simple, and we’ve already created all of the info we need in our previous formula step.

On the first tab, we simply need to specify the field that contains the URL we wish to query and what we wish to do with the download data. In this case, since we want to parse the data within our existing workflow, we should choose the ‘to a field’ option and keep it set to ‘String’.

On the ‘Headers’ tab, this is where we must specify our ‘Authorization’ field which contains our token.

As you are denoting a ‘Name:Value’ pair when you work with header information, in the case of ‘And values from these fields’, the field name acts as the ‘Name’, so it must be exactly the same as what your API expects.

Whilst the ‘Payload’ and ‘Connection’ tabs can remain untouched, it’s worth checking that the ‘HTTP Action’ (on the Payload tab) is set to GET (which is the default), as we are retrieving or GETTING data from the API.

Once you have set this up, let’s run our workflow. If we have everything configured correctly, we should get the response ‘HTTP/1.1 200 OK’ in the ‘Download Headers’ field and we should also see some geographic type data in our ‘Download Data’ field, which we will now parse into a structured table.

To parse this data, I’m first going to use the JSON parse tool (as it’s JSON format), and then use a bit of RegEx to pull out the key information from our ‘JSON Name’ field.

The structure is simple, the digit which proceeds the full stop is a ‘Line ID’ for each WOEID location, whilst all the detail after that can be used as a column name. Therefore, we can use the following RegEx statement to retrieve this information:

(\d+)\.(.+)

This essentially means, create us one field, which contains the first numbers found succeeded by a full stop, and then create us a second field with everything after the full stop.

The image below highlights the resulting data after this parsing, and the configuration of the RegEx tool to achieve this result.

This process is then proceeded by a series of steps to bring this data into a structured table with nice and clean column headers. The process is fairly generic and this blog provides a detailed explanation, including the configuration of the cross-tab tool (for reference, the complete workflow used in this example is linked to later in this post).

In this example, I also keep the ‘Authorization’ field as a ‘key field’ through our cross-tab so that we don’t have to get another authentication token to perform our second API call later on.

Therefore, after some basic cleaning and filtering to only include towns in the UK, my data stream now looks like the data in the image below, including our extremely important ‘WOEID’ field.

So, let’s make our second API call.

This time, we don’t need to remake our authorization header, as we already have this, but we do need to make a new Resource URL.

The documentation shows we must append an ‘id’ parameter to our resource URL which is our ‘WOEID’; the documentation also demonstrates how we can exclude responses from our next API call, which is optional, and we will not do it in this example.

Again, we will use a formula tool to build our Resource URL, this time, appending our ‘Location WOEID’ field to the URL.

Once we’ve done this, it’s a case of getting the data from the API and parsing the result. The Download tool can be configured exactly as before, and though this endpoint is returning different data to our previous one, the steps to bring this into a structured table are identical. We’ll start by using RegEx to identify the appropriate line number and column headers before cross-tabbing it into a true data table.

Now we have a data table with which we can begin to answer those business questions and we can also capture this data historically in order to track the trends of our trends.

One really important note when working with this API is that the endpoint has a request limit, as outlined usefully in the documentation.

If we exceed this rate limit, then we will no longer get a ‘HTTP/1.1 200 OK’ response, but instead receive a ‘HTTP/1.1 429 Too Many Requests’ response.

In my use-case, this is not too much of an issue. Providing I don’t schedule my workflow to run more than once in a 15-minute period, I should be fine as the number of towns in the UK is quite low.

In the US, this is likely a different story, and you may need to build in a mechanism which prioritises which towns to download data for.

One quick win we could apply is use the ‘Message’ tool to write an error to the log indicated in the WOEIDs and Location names with which information was not fetched for.

The complete workflow can be found here, though in order to run the workflow you must place your Twitter Application credentials in the text input at the beginning of the workflow.

Building your internal social network

Before signing off, I just want to demonstrate another example use-case for how using the Twitter API can be impactful for your business; this time using the ‘Get Friends’ and ‘Get Followers’ endpoints.

I have pre-developed macros that allow you to get this information using the API, which can be found on the public Alteryx Gallery.

Get Twitter Followers

Get Twitter Friends

Internally, we have used these macros alongside a list of employees’ Twitter handles (we use Twitter in order to publish and share content that we think will be useful to our peers), to identify ‘gaps’ in our network. By that I mean, identify individuals who are not following one another, and nudge them to do so (via an Alteryx workflow which emails employees with a list of people they don’t follow).

And on that note, this blog is done!

Ben, out.

Engine Works

GET Trends; an Example of Working With the Twitter API