Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Creating a workflow to analyze topics in PDFs

zmazur
5 - Atom

Hi all - 

 

I'm trying to create a workflow that analyzes PDFs and returns key topics around certain words. I'm working with about 200 PDFs so manual analysis isn't plausible. For example, I want to be able to analyze if the word "revenue" is used in a sentence, which other words are frequently used in the same sentence. 

 

I've already tried to the topic modeling tool, but was having a hard time grasping what each topic represented and gaining the level of insight I'm seeking.

 

Best,

Zach

2 REPLIES 2
echuong1
Alteryx Alumni (Retired)

If you're mainly looking for words, so you can do analysis based on frequency. This will work for VERY general topics and keywords like you outlined. The topic modeling tool is better though if you're doing analysis on actual topics, since they take into consideration frequency with other words (thus forming topics).

 

It sounds like you have access to the Intelligence Suite. I'd start by using the text pre-process to remove punctuation, stop words, and convert to root word. From there, you can split each word to a separate row using the text to columns and a space as a delimiter. Once you've removed all of the "noise" from your data, you can do a group by on the word and a count. You may be able to decipher topics based on frequency.

 

See attached for an example. Hope this helps!

 

echuong1_0-1600825364488.png

 

zmazur
5 - Atom

Thank you so much for your help!

 

Labels