Happy 8th birthday to the Maveryx Community! Take a walk down memory lane in our birthday blog, and don't miss out on the awesome birthday present that all Maveryx Community members get to take advantage of!
1) Lowercase and ReplaceCharacter formula: lowercase(replacechar([lyric],'",.:;?!()',"")): I used Data Cleansing tool at first to remove leading whitespace, punctuation and modify upper case but did not work exactly as expected. I assume the discrepancy is from single quote punctuation getting removed by Data Cleansing tool but not by formula. 2) Text to Column: learned using Delimiter \s to represent space and split to rows.I used Reg Ex to tokenize entire words to rows but as a result some abbreviated words containing single quotes, such as "that's", ended up splitting into two words, "that" and "s". So when counting the top words the result included single letter words like "s", "ll", etc. 3) Summarize: count distinct -useful when counting unique values. The Unique tool returned a different result. 4) Join: left: lyric and right stop_words: this is so smart because the left output shows all lyrics without stop words which is what we want and can proceed to next steps, joined output shows lyrics matching stop words and right output shows stop words "leftovers" that don't match any lyric so these are the data we don't need. 5) Again: Sort -- two sorting orders and using dictionary order; Sample -- first n rows and grouping by column; Summarize -- concatenate string. add space in concat_string -- replacechar([Concat_lyric],","," ").