Last week's solution can be found HERE.
This week's challenge, by its title, sounds like a simple task. However, a quick look at the data will show that it's not going to be just a summarize tool. If you have been meaning to finally learn the RegEx tool, this exercise will be a great first foray with a relatively simple expression. In this exercise, we will take a set of very ugly data, parse it to find hashtags within a text field, find out how many times the hashtag was written, and what users used the hashtag.
I'm not sure the results provided are correct as they show a #followme result against id_str_13, but that text does not contain a hashtag, unless I'm missing something?
That aside, my solution is as follows:
Of course I wanted to solve this without the use of RegEx (just being difficult). I don't agree that id_str_13 should have any counts.
I have attached my work for a deep dive-- fun stuff!
I would like to know how the expression #.*?\> works. I was also trying to solve the challenge using RegEx tool but can't seem to figure out the correct expression to use. Appreciate your feedback on this :-)
In regex the .*? part simply means "take everything in between". From the dropdown you'll see a list of Regex tools that allow you to then construct things like that seen in the example.
The # here, followed by the .*? simply means take everything from the # symbol to...
I then used the \> which means "the end of the word".
So, in speech terms, it literally translates to "take everything from the # symbol to the end of the word".
Regex really isn't as scary as it first appears. I myself had no Regex experience before using Alteryx but it's really easy once you get your head around it.
Try experimenting with the dropdown list:
Feel free to message me if you have any questions or if you have a specific issue with which you're struggling.