Alteryx Designer Desktop Discussions

hisyam10 · ‎02-24-2020

Hi all. I received this question from a client.

What function or method do I need to use to study popular keywords in a string? Without using the Split method.

I have searched and tried various methods and all of them involves Split method.

Example:

AAA trading enterprise

BBB trading enterprise

CCC solutions enterprise

DDD solutions

Trading - 2

Solutions - 2

Enterprise - 3

afv2688 · ‎02-25-2020

Hello @hisyam10,

You could use the multirow formula with an if statement similar to if keyword then row-1 value +1 but i don't think thats pretty efficient since you would need to create one for each keyword. In the end it would result in a lot of coding for a little effort. If you don't want it to show splitted into rows you could always in the end transpose (or summarize) your answer to get the result showing everything on the same row.

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Regards

danilang · ‎02-25-2020

Hi @hisyam10

When performing lexical analysis, the accepted first step is to tokenize your input. This usually performed by some some method that is analogous to splitting the sentences in the input to words and then transposing the words into rows.

Sentence number	Word number	Word
1	1	When
1	2	performing
1	3	lexical
1	4	analysis,
1	5	the
1	6	accepted
1	7	first
1	8	step
1	9	is

You can then analyze each word, extract summary statistics, etc.

Maybe there is a way to perform this without splitting to words, but it would probably be enormously complex.

Why does the client have an aversion to splitting?

Dan

AbhilashR · ‎02-25-2020

Hi @hisyam10 ,

Agreeing with @danilang, tokenizing would be your standard first step when performing text analysis. But if your client has an aversion to the splitting approach, would they be okay using a python script? Attached is a sample solution to help you get a word frequency count.

It is somewhat of a cheat approach, only from the standpoint that I am not using standard Alteryx tools, but it gets the work done.

from ayx import Alteryx
import pandas as pd
import nltk
import numpy as np
#Alteryx data read
data = Alteryx.read("#1")
word_dist = nltk.FreqDist(data['Concat_Field1'])
data_series = pd.Series(np.concatenate([x.split() for x in word_dist])).value_counts()
#conversion of dataSeries to dataFrame
df = data_series.to_frame().reset_index()
Alteryx.write(df,1)

Please let us know if this solution is acceptable (if yes, please mark it as Accept).

hisyam10 · ‎02-27-2020

Thank you all for your time and hard work in helping me with this question. I really appreciate it very much. I have a solution but the solution is not quite dynamic as the workflow needs to be altered when there is a new keyword. However, no split method was used in this workflow.

Here I attach the workflow that I did for reference.

DiegoParker · ‎02-28-2020

hi @hisyam10

If that is the approach you prefer I'd recommend you creating a simple macro.

I've added a text input where you can select which words to count, so if you want to update it then you just need to write the word there instead of going into the formulas.

Please find the workbook attached.

Hope this helps If does, can I ask you to mark it as a solution? this will help other users to find it and will allow us to close the thread. Many thanks!

Best,
Diego

Alteryx Designer Desktop Discussions

What function needed to study a popular keyword in a string without using Split method?