Like so many tools, the idea for this sprang from a conversation in the Solutions Center, at Inspire 2016. A user was regularly receiving emails that contained attachments. He wanted to use them in his analytic workflows and encountered a lot of manual work related to getting the files into an accessible location. I had a similar situation in a previous job, so it was easy for me to relate to his frustration. Using tools that the Alteryx team had previously built, like the Google OAuth Macro and Google Sheets Tools, we were able to build a POC during our conversation. The following walks through how this was accomplished and packaged as the Gmail Attachment Input tool.
At a high-level, the tool makes three Gmail API requests that build on the prior requests. The first request returns a list of email messages that match the search parameters specified. The second request iterates through the message list and returns details (date, from, subject, attachment name, and attachmentId) for each message. The final request returns the message attachment data.
Prerequisites
To access the account's email, the Gmail Attachment Input tool uses a Client ID, Client Secret, and Refresh Token associated with the Gmail API and account. The Alteryx Knowledge Base article, How to Create Google API Credentials, walks you through how to obtain these. To fit our use case, in Step 6 you'll want to use Gmail API and in Step 16 you'll want to use https://www.googleapis.com/auth/gmail.readonly.
The initial step of any tool or macro is to gather user inputs. For this tool, there are only four pieces of input that need to be gathered from the user: Client ID, Client Secret, and Refresh Token (for authentication) and search parameters. The search parameters are used to identify which emails we want to download attachments from. An HTML frontend was used to capture this input from users. Using an HTML frontend allowed me to easily integrate search parameter tips and a link to the Gmail Help page, Search operators you can use with Gmail. Creating an HTML frontend won't be covered in this blog post but you can find more information on the HTML SDK at Inspiring Ingenuity (our CTO @Ned 's blog.)
The process begins with refreshing the Authentication token needed for accessing the Google APIs. We'll gloss over it here because a previous blog post I wrote, OAuth Macro for Google API Authentication, covers it in detail.
The first Gmail specific API call made is to get a list of messages that match the search parameters specified. Using the Download tool, a GET request from https://www.googleapis.com/gmail/v1/users/me/messages is made.
The URL is made up of three parts:
The request produces a JSON response which is parsed, using the JSON Parse tool, to produce a list of message_id's that match our search parameters:
Using the list of ids from the previous step, an API call is made for each message_id. Each call is a GET request from https://www.googleapis.com/gmail/v1/users/me/messages with the message_id appended.
The URL is made up of two parts:
The JSON response has many different parts and requires extensive cleanup:
Key pieces of data that we want to parse out of the messages response are: Date, From, Subject, Attachment name, and most importantly attachmentId. AttachmentId will be used to identify which attachment to download in the next step.
The last step creates a final API call, using all of the information gathered in the previous steps, that returns the raw attachment data.
A GET request is made for each attachmentId from https://www.googleapis.com/gmail/v1/users/me/messages/[messageId]/attachments/[attachmentId]?fields=....
The URL is made up of multiple parts:
Each attachment data response is a Base64 encoded string:
Using two Blob Convert tools, it's easy to convert the response into usable data. The Base64 encoded string is first converted into a Blob and then converted from a Blob to a UTF-8 string. The final step is to remove unnecessary fields and output the message details and attachment data using a Macro Output tool.
The FINAL PRODUCT
To demonstrate this tool, I configured it to search for all emails from steve.wagner@alteryx.com that have an attachment with a .txt or .csv extension. The Gmail Attachment Input tool returned four files attached to three emails.
Unfinished Business
A great addition that wasn't implemented in v1.0 of this tool is an automatic parsing/transform of the attachment data. Right now, the attachment data is returned in a single record, requiring the user to build out the workflow to transform the data into a table. Also, I haven't been able to explore which attachment file types are supported.
Two Challenges
1. Download the tool and implement automatic parsing/transform of the attachment data based on file type. Publish your attempt to the Alteryx Gallery and link to it in the comments.
2. Discover which file types are supported. Post any new finds in the comments.
Click to download the Gmail Attachment Input tool
Note that GmailAttachmentInput.yxi requires Alteryx Designer v10.1+