Join the Alteryx Community’s Maveryx Summer Cup event! Compete, network with others, and earn your gold through a series of challenges from July 24th to August 11th. Learn more about the event here.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Converting text description into DTM for Clustering

adriantanjy
6 - Meteoroid

Hi everyone,

 

I am very new to Alteryx and am trying to use it for analyzing unstructured data. I have a column of description in text form and I intend to use the K-Means Clustering tool for topic modelling. For K-means to work on text, I will need to convert my text into a Document Term Matrix (DTM) so that they appear as continuous variables to the clustering tool. However, I am struggling to find a way I can convert my text to a DTM.

 

Does anyone know a way to do so? I am currently looking at the R tool but am not exactly sure how to start too. Hoping that all of you experts here can help me out!

 

I have looked through past questions and knowledge posts on text analysis and realized that most fell back on the Microsoft Azure ML Text Analysis Macro. However, I would like to avoid using the macro (to not be restricted to limited runs every month for scalability) and instead use tools that are available in Alteryx.

 

Thanks to everyone in advance!

11 REPLIES 11
mnitin3
7 - Meteor

I need to try on the tool but I feel using R Tool would be the best and fast way to do this.

 

You may pass on the data to R script (refer https://help.alteryx.com/9.5/R.htm#Code_Options) then make R script return DTM as a data frame. 

 

I will keep you posted if I am able to create an example of this.

 

Let me know if you need help in writing R script to get DTM.

adriantanjy
6 - Meteoroid

Hi thanks for your reply, I'm trying to write a R script to get a DTM but am unable to get a sound output - somewhere might have gone wrong. Here is my R code:

run_data <- read.Alteryx("#1", mode="data.frame")

library(NLP)
library(tm)
library(stats)

detailed_desc <- run_data[,2]
desc_corpus   <- Corpus(VectorSource(detailed_desc)) 
dtm_desc <- DocumentTermMatrix(desc_corpus)



write.Alteryx(dtm_desc, 1)

which gave me the following output example:

image.png

 

 

 

 

 

 

Can I have some help to see where I went wrong?

mnitin3
7 - Meteor

one thing I am seeing may be causing an issue is the class of the dtm_desc object.

 

I believe the object type would be a non-data frame, so you need to convert it into a data frame to match Alteryx function return requirement.

 

Conversion command:

dtm_desc <- as.data.frame(dtm_desc)

 

adriantanjy
6 - Meteoroid

I tried using the as.data.frame command but it threw back this error:

 

image.png

 

mnitin3
7 - Meteor

ok. Can you try with below two options: 

 

1. library(tidytext)   dtm_desc <- tidy(dtm_desc)

2. dtm_desc <- data.frame(dtm_desc) 

adriantanjy
6 - Meteoroid

Hi mnitin3,

 

Thanks for your replies but they are not working:

 

1) no package called 'tidytext'

2) threw same error as as.data.frame command

mnitin3
7 - Meteor

I am able to use tidytext package... maybe you need to install the package. Please install tidytext and run the commands again. let me know if that works.

adriantanjy
6 - Meteoroid

Hi it still doesn't work. Not able to install r packages - I believe there are only a fixed set of packages that work in Alteryx.

 

Did you manage to install the package in the alteryx workflow? If you did can you share your code please?

mnitin3
7 - Meteor

I am sorry. Actually, I don't have Alteryx as of now so couldn't try this on the tool.

 

yes, this may ba possibility that only set of packages works in R tool of Alteryx. Let me find some way using the base package to convert it and let you know.

Labels