Alteryx Designer

Find answers, ask questions, and share expertise about Alteryx Designer.
Register for the upcoming Live Community Q&A Session - and don't forget to submit your questions for @DeanS regarding the future role of analytics here.
SOLVED

What function needed to study a popular keyword in a string without using Split method?

Highlighted
Alteryx Partner

Hi all. I received this question from a client.

What function or method do I need to use to study popular keywords in a string? Without using the Split method.

I have searched and tried various methods and all of them involves Split method.

 

Example: 

AAA trading enterprise

BBB trading enterprise

CCC solutions enterprise

DDD solutions

 

Trading - 2

Solutions - 2

Enterprise - 3

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

Hello @hisyam10,

 

You could use the multirow formula with an if statement similar to if keyword then row-1 value +1 but i don't think thats pretty efficient since you would need to create one for each keyword. In the end it would result in a lot of coding for a little effort. If you don't want it to show splitted into rows you could always in the end transpose (or summarize) your answer to get the result showing everything on the same row.

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Regards

Highlighted
17 - Castor
17 - Castor

Hi @hisyam10 

 

When performing lexical analysis, the accepted first step is to tokenize your input.  This usually performed by some some method that is analogous to splitting the sentences in the input to words and then transposing the words into rows. 

 

Sentence numberWord numberWord
11When
12performing 
13lexical
14analysis,
15the
16accepted
17first
18step
19is

 

You can then analyze each word, extract summary statistics, etc.

 

Maybe there is a way to perform this without splitting to words, but it would probably be enormously complex.

 

Why does the client have an aversion to splitting?

 

Dan 

 

 

 

 

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

Hi @hisyam10 ,

 

Agreeing with @danilang, tokenizing would be your standard first step when performing text analysis. But if your client has an aversion to the splitting approach, would they be okay using a python script? Attached is a sample solution to help you get a word frequency count.

 

It is somewhat of a cheat approach, only from the standpoint that I am not using standard Alteryx tools, but it gets the work done.

 

from ayx import Alteryx
import pandas as pd
import nltk
import numpy as np

#Alteryx data read
data = Alteryx.read("#1")

word_dist = nltk.FreqDist(data['Concat_Field1'])
data_series = pd.Series(np.concatenate([x.split() for x in word_dist])).value_counts()

#conversion of dataSeries to dataFrame
df = data_series.to_frame().reset_index()
Alteryx.write(df,1)

Please let us know if this solution is acceptable (if yes, please mark it as Accept).

Highlighted
Alteryx Partner

Thank you all for your time and hard work in helping me with this question. I really appreciate it very much. I have a solution but the solution is not quite dynamic as the workflow needs to be altered when there is a new keyword. However, no split method was used in this workflow.

 

Here I attach the workflow that I did for reference.

Highlighted
Alteryx Partner

hi @hisyam10 

 

If that is the approach you prefer I'd recommend you creating a simple macro.

 

DiegoParker_0-1582887125847.png

 

DiegoParker_2-1582887194067.png

 

 

I've added a text input where you can select which words to count, so if you want to update it then you just need to write the word there instead of going into the formulas.

 

DiegoParker_1-1582887154881.png

 

Please find the workbook attached.

 

Hope this helps If does, can I ask you to mark it as a solution? this will help other users to find it and will allow us to close the thread. Many thanks!


Best,
Diego

 

 

 

 

Labels