In Part 3 we review the Word-Relevance Summary and visualization data. It returns the two previously mentioned metrics: relevance and saliency.
Saliency
It helps us identify the words that are most informative to identify topics within documents. A higher salience value indicates that a word is more useful in identifying a specific topic.
Salience is always a positive value and does not have a maximum. It is designed to see specific words in relation to the totality of documents that we are analyzing; a value of 0 indicates that a word is present in all topics.
Relevance
It is a metric used to order words within topics. It helps us to identify the most appropriate words for each topic, and reflects the level at which a word belongs to a topic. The higher the value for a given topic, the more important that word will be for that topic.
Both metrics show relative values that we can use to describe and understand a specific topic. For a deeper dive, see Getting to the Point with Topic Modeling - Interpreting the Results.
![Garabujo7_0-1628694803119.png Garabujo7_0-1628694803119.png](/t5/image/serverpage/image-id/197642iE219D1AD7C1E2B3F/image-dimensions/567x219?v=v2)
Assigning Tags to Topics
Assigning tags to topics allows us to label documents for categorization. Select the R output of the Topic Modeling tool and insert a Formula tool after, to be able to extract the topic to which each word belongs.
![Garabujo7_1-1628694819711.png Garabujo7_1-1628694819711.png](/t5/image/serverpage/image-id/197643i037A3A5FA1B3D9B2/image-size/medium?v=v2&px=400)
![Garabujo7_2-1628694826068.png Garabujo7_2-1628694826068.png](/t5/image/serverpage/image-id/197644i791750F7A0B5D499/image-size/medium?v=v2&px=400)
![Garabujo7_3-1628694832164.png Garabujo7_3-1628694832164.png](/t5/image/serverpage/image-id/197645iE227E3EE2ABACABD/image-dimensions/499x91?v=v2)
The MaxIDX formula will give us the maximum value among the three relevance fields. The result is an integer, at the end we add 1. In this way we will have assigned a topic for each word, along with its relevance.
![Garabujo7_4-1628694848562.png Garabujo7_4-1628694848562.png](/t5/image/serverpage/image-id/197646i420DB42D2DB61EF5/image-dimensions/558x191?v=v2)
The next step is to add a Sample tool to select only the first N words of each topic we create.
![Garabujo7_5-1628694866542.png Garabujo7_5-1628694866542.png](/t5/image/serverpage/image-id/197647iF79486A30533C98B/image-size/medium?v=v2&px=400)
![Garabujo7_6-1628694877135.png Garabujo7_6-1628694877135.png](/t5/image/serverpage/image-id/197648iCD2395E3D20A83BA/image-size/medium?v=v2&px=400)
We get the 3 most prominent and relevant words:
![Garabujo7_7-1628694887757.png Garabujo7_7-1628694887757.png](/t5/image/serverpage/image-id/197649i7F29C9702B9C976B/image-dimensions/600x156?v=v2)
The next step is to create the tags based on the topic terms. To make it dynamic, use a Summarize tool to create a concatenated field with the three words to serve as a label for the topic.
![Garabujo7_8-1628694892366.png Garabujo7_8-1628694892366.png](/t5/image/serverpage/image-id/197650iF1A6C5610AF36570/image-size/medium?v=v2&px=400)
Using a Find and Replace tool we can change the topic numbers to text labels that make more sense for business users consuming this analysis.
![Garabujo7_9-1628694908738.png Garabujo7_9-1628694908738.png](/t5/image/serverpage/image-id/197651iC76FBF3643C8D48D/image-size/medium?v=v2&px=400)
![Garabujo7_10-1628694911931.png Garabujo7_10-1628694911931.png](/t5/image/serverpage/image-id/197652iEA9980B917DB8605/image-size/medium?v=v2&px=400)
![Garabujo7_11-1628694920888.png Garabujo7_11-1628694920888.png](/t5/image/serverpage/image-id/197653i3FDD4A7E469D7758/image-dimensions/520x152?v=v2)
Now we have each document tagged with the topic that belongs to it. With that we can summarize the topics to count how many documents belong to each category.
![Garabujo7_12-1628694937885.png Garabujo7_12-1628694937885.png](/t5/image/serverpage/image-id/197654iF1A3378342292FE6/image-dimensions/491x81?v=v2)
Visualize each topic in a custom word cloud
To categorize each document within its topic, we will use a similar process. Taking output D from the Topic Modeling tool, we add a Formula tool to it. Use the MaxIDX() function to obtain the topic that has the most relevance for each document.
![Garabujo7_13-1628694946458.png Garabujo7_13-1628694946458.png](/t5/image/serverpage/image-id/197655i64CE67855B72B004/image-size/medium?v=v2&px=400)
Filter each topic to view it independently.
Using the Word Cloud tool, we will set up the visualization.
![Garabujo7_14-1628694958557.png Garabujo7_14-1628694958557.png](/t5/image/serverpage/image-id/197656i91F7DF695D4FE5A6/image-size/medium?v=v2&px=400)
First, select the field that we want to visualize. To customize the word cloud, select the corresponding option.
![Garabujo7_15-1628694971783.png Garabujo7_15-1628694971783.png](/t5/image/serverpage/image-id/197657i12338E4AF26BBC1E/image-dimensions/633x146?v=v2)
There are several options for customization:
- Choose a color for the background.
- Select the maximum number of words to evaluate
- Resizing
- Masking means that we can define the shape of the word cloud.
![Garabujo7_16-1628694991493.png Garabujo7_16-1628694991493.png](/t5/image/serverpage/image-id/197658iB0DBF1B86AC3089F/image-size/medium?v=v2&px=400)
To take an image as a template, add a Blob Input tool (binary large object) from the Developer tool category and select the path where the file is located.
![Garabujo7_17-1628695006439.png Garabujo7_17-1628695006439.png](/t5/image/serverpage/image-id/197659i2F702E840D749B01/image-size/medium?v=v2&px=400)
![Garabujo7_18-1628695023373.png Garabujo7_18-1628695023373.png](/t5/image/serverpage/image-id/197660iA050FAA843F5C472/image-size/medium?v=v2&px=400)
Once this is done, in the Word Cloud configuration, the Blob option will appear in the mask option. Run the workflow and the word cloud is presented. In this case I used the twitter logo to shape the report.
![Garabujo7_19-1628695054830.png Garabujo7_19-1628695054830.png](/t5/image/serverpage/image-id/197661i874A8FCAF2E1493C/image-size/medium?v=v2&px=400)
![Garabujo7_20-1628695069489.png Garabujo7_20-1628695069489.png](/t5/image/serverpage/image-id/197662i8096B539317B09FB/image-size/medium?v=v2&px=400)
The last part in this series will demonstrate how to export a trained topic model to score new items and speed up the process.