Hi all. I received this question from a client.
What function or method do I need to use to study popular keywords in a string? Without using the Split method.
I have searched and tried various methods and all of them involves Split method.
Example:
AAA trading enterprise
BBB trading enterprise
CCC solutions enterprise
DDD solutions
Trading - 2
Solutions - 2
Enterprise - 3
Solved! Go to Solution.
Hello @hisyam10,
You could use the multirow formula with an if statement similar to if keyword then row-1 value +1 but i don't think thats pretty efficient since you would need to create one for each keyword. In the end it would result in a lot of coding for a little effort. If you don't want it to show splitted into rows you could always in the end transpose (or summarize) your answer to get the result showing everything on the same row.
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Regards
Hi @hisyam10
When performing lexical analysis, the accepted first step is to tokenize your input. This usually performed by some some method that is analogous to splitting the sentences in the input to words and then transposing the words into rows.
Sentence number | Word number | Word |
1 | 1 | When |
1 | 2 | performing |
1 | 3 | lexical |
1 | 4 | analysis, |
1 | 5 | the |
1 | 6 | accepted |
1 | 7 | first |
1 | 8 | step |
1 | 9 | is |
You can then analyze each word, extract summary statistics, etc.
Maybe there is a way to perform this without splitting to words, but it would probably be enormously complex.
Why does the client have an aversion to splitting?
Dan
Hi @hisyam10 ,
Agreeing with @danilang, tokenizing would be your standard first step when performing text analysis. But if your client has an aversion to the splitting approach, would they be okay using a python script? Attached is a sample solution to help you get a word frequency count.
It is somewhat of a cheat approach, only from the standpoint that I am not using standard Alteryx tools, but it gets the work done.
from ayx import Alteryx
import pandas as pd
import nltk
import numpy as np#Alteryx data read
data = Alteryx.read("#1")word_dist = nltk.FreqDist(data['Concat_Field1'])
data_series = pd.Series(np.concatenate([x.split() for x in word_dist])).value_counts()#conversion of dataSeries to dataFrame
df = data_series.to_frame().reset_index()
Alteryx.write(df,1)
Please let us know if this solution is acceptable (if yes, please mark it as Accept).
Thank you all for your time and hard work in helping me with this question. I really appreciate it very much. I have a solution but the solution is not quite dynamic as the workflow needs to be altered when there is a new keyword. However, no split method was used in this workflow.
Here I attach the workflow that I did for reference.
hi @hisyam10
If that is the approach you prefer I'd recommend you creating a simple macro.
I've added a text input where you can select which words to count, so if you want to update it then you just need to write the word there instead of going into the formulas.
Please find the workbook attached.
Hope this helps If does, can I ask you to mark it as a solution? this will help other users to find it and will allow us to close the thread. Many thanks!
Best,
Diego