Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
le_luu
7 - Meteor

At the end of October 2024, Alteryx Designer updated the latest version to 2024.2. If you haven't updated it yet, click on this link to download and update Alteryx Designer to the latest version. Many of the updated features in this latest version are very helpful in preparing data. Alteryx released a new tool called Rank tool, adding the Group By function in the Record ID tool, and many new updates ... You can check Alteryx Designer 2024.2 Release Notes here.

 

Extracting data by web-scraping data from web pages and retrieving data from API are my favorite topics. In this latest version, Alteryx Designer updated some helpful features in the JSON Parse tool. Those new features save so much time when I need to parse JSON after retrieving data from API, flatten nested objects, and arrays without typing any lines of code. In this blog, I will walk through:

 

  1. JSON structure
  2. Flatten nested objects and nested arrays with the JSON Parse tool (Using Unnest JSON field and Flatten Array options)
  3. Output values into data type specific values
  4. Output values into single string field

 

Ah! Remember to update your Alteryx Designer version to 2024.2 to follow the steps I will show later in this blog.

 

1. JSON Structure

 

JSON object contains key-value pair(s). The key and value are separated by a colon mark ":". If there are more than one key-value pair, each pair will be separated by a comma. The data type of each object could be strings, numbers, Booleans, and null.

 

In some cases, an object contains multiple nested objects or nested arrays. The object is surrounded by a curly braces {}. An array is surrounded by a square bracket [].

 

json_1.png

Fig. 1: Example of a JSON structure

 

In the example above, there is a big object which contains keys:

  • title (the value is a string)
  • name (the value is a string)
  • role (the value is a string)
  • company (the value is a string)
  • skills (this is an array when the value is surrounded by a square bracket [], each value in this array is a string and separated by a comma )
  • blog_post (is also an array with 3 nested objects because there are 3 curly braces pairs. For each object, there are 3 key-value pairs inside. For example, in Object 1 in Fig. 1, tableau, Alteryx, and SQL are an array when their values are listed in a square bracket and each value is separated by a comma)

 

If I sketch it in a tree diagram, it would look like this:

 

json_2.png

Fig. 2: JSON structure is displayed under a tree diagram

 

2. Flatten Nested Objects and Nested Arrays with the JSON Parse Tool

 

In this session, I will demo how to send the request to extract data from API and parse JSON with the updated JSON Parse Tool.

 

The API I use is from the Star Wars API (SWAPI). You can read the API documentation page here. The API link that I use is: https://swapi.dev/api/planets/

 

The data is about the planets that appeared in the Star Wars series including the name of the planet, rotation period, diameter, climate, gravity, films...

 

json_3.png

Fig. 3: JSON structure after retrieving data from API

 

If I display that JSON structure under the tree diagram, you will see the structure clearly with nested objects and nested arrays.

 

json_4.png

Fig. 4: JSON structure displayed in a tree diagram

 

Let's open Alteryx Designer 2024.2!

 

Drag and drop the Text Input tool into the canvas and copy the API link above to a cell in the Text Input tool.

 

json_5.png

Fig. 5: Copy the API link into the Text Input tool

 

Then, I drag and drop the Download tool from the Developer palette to connect to the Text Input tool. The Download tool will help us to send the request to the server to retrieve data from API. The API we are using is free and public, so I don't need to set the Headers. If you are using API with the access token or API token, you should set it in the Headers tab in the Download tool.

 

In the Configuration window of the Download tool, I unchecked the box Use Data Connection Manager (DCM) and ensure the Field is selecting the field url.

 

json_6.png

Fig. 6: Connect the Download tool to the Text Input tool to send the request to the server

 

If your request to the server succeeds, you will get the response in the Result window. The code in DownloadHeaders will be 200 OK. Your retrieved data will be in the Download Data column. If you use the Browse tool to connect to the output of the Download tool, you will see the JSON structure as Fig. 3 above.

 

Next, I drag and drop the JSON Parse tool (also from the Developer palette) to connect to the output anchor of the Download tool. Some options in the JSON Parse Tool require running with AMP. By default, Alteryx Designer will turn on AMP for you. If you are not sure, you can check by clicking on the blank canvas. Then, select the Runtime tab, and make sure that the box Use AMP Engine is checked.

 

json_7.png

Fig. 7: AMP Engine is on

 

Unnest JSON field (AMP Only)

 

This option only works with AMP on. This option will help to unnest JSON objects (One level deeper).

 

As I mentioned in the first part, the JSON object is surrounded by curly braces {}. If you look at the value in the DownloadData column, the leftmost mark is a curly brace "{" which represents an object.

 

json_8.png

Fig 8: Choose the Unnest JSON field (AMP only) to flatten nested objects

 

When choosing the Unnest JSON field, it will flatten the object to a deeper level. In Fig. 8, the tree diagram on the right shows the keys in the object after flattening.

 

Run the workflow, the result will return 4 keys in 4 separate columns. The DownloadData will be concatenated at the front of the key (Fig. 9).

 

json_9.png

Fig. 9: The result after flattening a JSON object

 

As you know from the tree diagram in Fig. 4, the results key is an array. The leftmost mark of the value in the DownloadData.results is a square bracket "[".

 

Flatten array (AMP Only)

 

Next, I want to flatten the array. The JSON Parse tool has a feature to flatten the array only. It also requires AMP on.

 

I connect a new JSON Parse tool to connect to the previous JSON Parse tool. In the Configuration window, at the JSON field, I selected DownloadData.results and chose the option Flatten Array (AMP Only) as I wanted to flatten the Results array.

 

json_10.png

Fig. 10: Use the JSON Parse Tool to flatten the nested array

 

From the tree diagram, in the results array, there are 10 nested objects. In the Results window in Fig. 10, there are 2 columns for results. The DownloadData.results_flatten will store 10 nested objects from the results array. The DownloadData.results_idx will store the index of each nested object after flattening sorted by ascending.

 

Repeat the step to unnest the object and flatten the array

 

If you notice the value in the DownloadData.results_flatten, the leftmost mark is a curly brace "{" which is an object. From here, you can use the Unnest JSON field (AMP Only) option from another JSON Parse tool to unnest the object (Fig. 11a).

 

json_11a.png

Fig. 11a: Flatten a nested Object using the Unnest JSON field option from the JSON Parse tool

 

json_11b.png

Fig. 11b: The result after unnesting the results flatten objects displayed in the tree diagram

 

After unnesting the DownloadData.results_flatten field, I got some new columns which are keys of the nested object in the result array. From the tree diagram, you notice that we have 2 nested arrays (residents and films). In the result window, you also see the leftmost mark value of those columns is a square bracket "[".

 

Splitting the branch

 

Currently, we have 10 rows that contain 10 nested objects from the result array. For each nested object, there is an array for residents and an array for films.

 

json_12.png

 

 

Fig. 12: Residents array and Films array

 

In Fig. 12, that is an example in the first result object containing the resident 1 and film 1. The Resident 1 contains 10 values. The film 1 contains 5 values.

 

If you flatten either one array (Residents or Films) first and flatten the other array after that, it will explode your dataset. Let me show in the sketch.

 

json_13.png

Fig. 13: The result if flatten either one array first and flatten the other after then

 

In Fig. 13, I flattened the Residents array first, and then flattened the Film array after. I assumed that the number of elements in Residents is the same for all Residents and the number of values in Films is the same for all Films.

 

The number of rows if I do that = 10 Residents total * 10 residents for each big Resident * 10 films array for each resident value * 5 films for each film array = 5000 rows.

 

However, we only want each value in the Residents_1 array to match with the Films_1 array (Fig. 14). By doing this, the number of rows is less than 10 times and we don't need to prepare the data again.

 

json_14.png

Fig. 14: Match the index result of the Resident and Film

 

Alteryx already created the result index column. Based on this index, we can match the index of the Resident value with the index of the Film value (Fig. 15).

 

json_15.png

Fig. 15: The Results index column after flattening the Results array

 

Come back to our workflow, to return the result as I want in Fig. 14, I need to split the main branch into 2 small branches.

 

json_16.png

Fig. 16: Flatten each array in a separate branch

 

Now, I only join those 2 results back with the key is the result index from Fig. 15.

 

json_17.png

Fig. 17: Join 2 branches together with the result index, and remove unnecessary columns

 

If I sort the DownloadData.result_idx by ascending, you will see each residents_idx in the residents array will match with 5 films value from the films array.

 

json_18.png

Fig. 18: Flatten all nested objects and nested arrays into a separate row

 

Then, you can rename the column name, and remove unnecessary columns with the Select tool.

 

json_19.png

Fig. 19: Final Result Table

 

3. Output values into data type specific fields

 

This option does not require AMP. This option will organize data in each separate column (depending on the data type of the value).

 

json_20.png

Fig. 20: Flatten all levels and output the value in separate columns depending on the data type

 

4. Output values into single string field

 

This option also doesn't require AMP. This option will flatten all levels in your JSON structure.

 

In the Results window, each nested object will be returned in each separate row. If there are many values in the same object or array, Alteryx will automatically create a unique index id at the end.

 

json_21.png

Fig. 21: Flatten all levels and return the value as string values

 

One thing to notice is that all values in the JSON_ValueString are in the string data type.

 

In this blog, I shared the JSON structure after retrieving data from API. In the second part, I introduced the latest features from JSON Parse tool to flatten nested objects, arrays into each row by using 2 options (Unnest JSON field and Flatten Array) with AMP. The third and fourth part is 2 options that you can flatten all levels in the JSON structure into a separate row.

 

This new feature for the JSON Parse tool is very helpful; especially in the complex JSON structure. The user doesn't need to understand much about the JSON structure or have to count how many curly braces, and square brackets. It's also useful for those who don't have a programming background. Users only need to connect the tools and select options in the JSON Parse tool. Quickly, the JSON structure will transform to a data table. I really like this new feature from Alteryx.

 

I hope this blog is helpful to you when you need to parse JSON and prepare data. See you soon in the next blog!

 

This blog was originally published on theinformationlab.co.uk