Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Text Analytics

williamsa
7 - Meteor

Hello,
I am new to Text Analytics so any help would be greatly appreciated.
I have a data set where each row of data is a paragraph of comments from people outlining a safety topic they discussed with a colleague.
I want to essentially extrapolate categories/themes/topics from these paragraphs of comments (e.g. PPE, job briefs, qualifications, situational awareness etc). The end goal for me is to be able to derive which types of safety topics are the most/least discussed via the employees comments field without having to manually read each row of text to categorize.
I have read some similar requests on these board messages and sentiment analysis, topic modelling or word clouds seem to be the answer, however i don't know where to start. PS i have the Text Mining and Machine Learning suites already installed on my Alteryx Designer.
I started playing around with a workflow which i have attached. Any help would be most appreciated!

2 REPLIES 2
PhilippK
Alteryx Alumni (Retired)

Hello @williamsa ,

 

there are two ways how you can tackle this:

 

a) Unsupervised Machine Learning - e.g. with Alteryx' Topic Modeling:

With this method you do not know at the beginning, which safety topics exist. Thus you also do not tell the algorithm/Topic Modeling tool, which safety topics are possible. You only tell the Topic Modeling tool how many topics should get identified:

screenshot.png

As a result, you get in your example 15 additional columns. Each column refers to a topic. The values for each row represent the probability of this row belonging to the respective topic. For example, your first row as a probability of 77% to belong to topic 13:

screenshot2.PNG

In the end, you need to investigate what those topics have in common (e.g. word cloud, word frequencies etc.).

Please read the following article for more details:
https://community.alteryx.com/t5/Data-Science/Getting-to-the-Point-with-Topic-Modeling-Part-1-What-i...

 

b) Supervised Mache Learning 

With this method you know already at the beginning, which safety topics exist. 
Therefore, you can label a couple of rows (hard to say how much you need..there more the better). Create a new column for that with the formula tool. This is your training data set for a supervised machine learning algorithm. You can use the Machine Learning tool category (Assisted Modeling tool) for that.

Let me know whether this helps you.

Best regards
Phil

williamsa
7 - Meteor

@PhilippK Thank you so much for your guidance. This is a big help- i really appreciate it! I will start looking into your suggestions over this week. Thanks again.

Labels