One of the greatest strengths of modern web APIs is their flexible, developer-friendly nature, which provides numerous options for both the provider and the user. However, this flexibility can make it more intimidating for business users to deal with the various data formats that these APIs provide. The purpose of this article is to familiarize you with the main data formats used by the vast majority of web APIs, and provide the basic knowledge that will allow you to confidently process the data they return into a typical tabular format.
We will cover parsing data in the three most common web API formats:
XML (Extensible Markup Language)
Delimiter Separated Values (CSV, TSV, etc)
This knowledge will allow you to not only utilize web APIs within their workflows, but to also build reusable macros that allow you and other users you share with to fetch web data with minimal technical knowledge.
Obtaining the API Response
The first step in parsing an API response is to know where it is. The response from a given API can be accessed from the output of the Download tool. The response will contain the actual data provided by the API (in the “DownloadData” field) as well as the HTTP response parameters in the “DownloadHeaders” field. It should be noted that, regardless of the type of data format used by the API, the response data contained in the “DownloadData” field will be a single field of text for each response.
In the example below, the “DownloadData” field contains the entire contents of the response from a JSON API. Notice how in examples below for XML and CSV, this field contains the entire response for the respective format.
The next steps will demonstrate parsing the response data for various formats with Alteryx tools.
API Response Types
The type and structure of API responses can vary greatly depending on how the provider designed the service, but in general, web API responses fall into 3 general categories of data formats:
Many web APIs return results in a delimited format such as CSV (Comma Separated Values).
This approach is less common than JSON or XML response types.
Parsing the API Response
Alteryx comes with an excellent tool designed to process JSON data. The JSON Parse tool is found in the Developer category. Here, we first connect the output of the Download tool to the input of the JSON Parse tool.
Next, we select the field which contains the JSON formatted text. In this case, it is the “DownloadData” field. Notice we are keeping the “Output values into single string field” option checked. This will ensure that the output of the tool is as simple as possible. For more information on this setting and the JSON Parse tool's specific functions, go here.
After running the workflow, the output of the JSON Parse tool can be seen below. Notice that the output is in a key-value format, which is not typically compatible with the tabular data most business users utilize. We will take care of that in the next step. Another thing to notice is that many of the values in the “JSON_Name” field have a “word.otherword” naming format. This is a naming convention that is automatically generated by the JSON Parse tool to deal with the very common nested data that occurs in JSON.
For example, in the data below, there are fields such as “currently.time” and “currently.summary”. In the original JSON, these fields were depicted as follows:
Depending on the type and structure of data a JSON API is producing, sometimes it is preferable to use the Text To Columns tool to further split these “word.otherword” fields. In this particular scenario, we'll ignore them due to the simplicity of the returned JSON. Now, let's continue to the next step, where we convert the key-value format to a more friendly tabular format.
The Cross Tab tool is well suited for converting the key-value structure of the JSON Parse tool's output into a tabular format. First, connect the output of the JSON Parse tool to the input of a Cross Tab tool.
The standard column outputs of the JSON Parse tool will be “JSON_Name” for each Key, and “JSON_ValueString” for each associated Value. We will set the “New Column Headers” drop-down to the “JSON_Name” column, and the “Value for New Columns” drop-down to the “JSON_ValueString” column.
In the example below, notice that we are using a “Group Data by these Values” selection of a “request url”. This is optional, but is necessary in any case where there are multiple records in the JSON data. In any case where multiple records are present in a JSON response, the field(s) which are used as a unique identifier for each record must be selected to properly parse the data.
Below is the resulting output, which is in the desired tabular format.
Alteryx provides an equally powerful tool for dealing with XML data. Below, we can see an example of raw XML data being returned from the Download tool.
To process this raw XML data, use the XML Parse tool, which can be found in the Parse tool category. Connect the output of the Download tool to the input of the XML Parse tool.
Similar to what we did for parsing the raw JSON data, here we will next select the field in the Download tool's output which contains the raw data to parse. In this case, the field has the standard Download tool output name of “DownloadData”. The default options “Auto Detect Child” and “Return Child Values” are selected as well. For detailed information on these settings and the capabilities they provide, go here.
After running the workflow with the XML Parse tool configured as above, the raw XML data has been converted into the tabular data below. It is important to note that the column names are taken from the XML tags, and will be dependent on the specific XML format, or schema, used by the web API provider.
Delimited data, commonly known by its specific categories such as Comma Separated Value (CSV) or Tab Separated Value (TSV), is a particular easy data format to process from a web service. Although the majority of web API providers do not provide data in delimited formats, a growing number of modern API providers offer a delimited option.
Below, we see a form of delimited data (CSV in this case) being returned by the Download tool. Notice that, similar to the JSON and XML data, this data is all captured in a single row of a single field.
The first step to processing this data is to split the text into separate rows, since they are currently joined together. We do this by using the Text To Columns tool, which is found in the Parse tools category.
Below, notice that we have connected the output of the Download tool to the input of the Text To Columns tool, and have once again selected the “DownloadData” field for processing in the “Field to Split” drop-down menu.
Since our goal for the first step is to split the data into separate rows, we've selected the “Split to Rows” option. In the “Delimiters” field, we've put a “ ” entry, which represents “new line” to tell the Text To Columns tool to split the data whenever a line ends.
After running the Text To Columns tool configured for Split to Rows, we get the output below.
The next step for processing the delimited data is take the data, which has now been split into separate rows, and proceed to split each row into its separate columns. We do this using a second Text To Columns tool, with a different configuration than the first one used to split the data into rows.
Below, we have attached the output of the first Text To Columns tool to the input of a second Text To Columns tool. The second tool is configured to split the “DownloadData” field (notice that it is now many rows instead of only one) and will split it into columns.
To do this, we have specified the delimiter field with the “,” value. We know to do this because each column in the original raw data is separated from the other by a comma. Additionally, we have specified “14” for the “# of Columns” field. This is an important detail, because this cannot be automatically detected. It is up to the user to look at the raw data and determine how many columns are present. The easiest way to do this is to look at the first row of data (in this case, it contains the column names) and count them. As you may have guessed, there were 14 columns present in this data set.
For additional information on utilizing the Text To Columns tool, see the documentation here.
Below, we are now seeing a desirable output in a tabular format. Next, we will use a few Alteryx tools to change the column names to those returned in the first row of data.
Below, we will use two tools to get the data in its final, perfect format. First, we will add a Select tool to the output of the second Text To Columns tool.
Select only the desired columns. In this case, we are only selecting the columns which represent actual data returned from the API, rather than the columns which were present in the original request or raw response.
Here we see the output of the Select tool. Next, we will replace the current “DownloadDataX” column names with the values present in the first row.
Alteryx contains a tool called Dynamic Rename, which can be found in the Developer tool category. It can be used for numerous very powerful, developer-oriented capabilities described here, but it offers a very simple operation that is self-explanatory for normal users.
First, get the Dynamic Rename tool onto the canvas and connect the Select tool output to the Dynamic Rename tool's “L” input. (The “R” input will not be used for this exercise. If you are curious about what the Dynamic Rename tool is capable of, consult the documentation in the link in the paragraph above.)
Next, in the Dynamic Rename tool's configuration menu, select the “Take Field Names from First Row of Data” option.
Below, we have the output of the Dynamic Rename tool. The columns have been renamed to their respective values from the first record of returned data. We've now successfully parsed the delimited output from a web API.
In the previous steps, we covered how to properly convert the output of a web API to usable, tabular data that conforms with the needs of typical Alteryx users. We covered the three most common data formats used by API providers:
XML (Extensible Markup Language)
Delimiter Separated Values (CSV, TSV, etc)
With these elements, you can proceed to the next step: building an automated tool that fetches data from web sources into a reusable macro for embedding in your own and your colleagues' workflows.