Challenge #56: Parsing and Counting Hashtags
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Last week's solution can be found HERE.
This week's challenge, by its title, sounds like a simple task. However, a quick look at the data will show that it's not going to be just a summarize tool. If you have been meaning to finally learn the RegEx tool, this exercise will be a great first foray with a relatively simple expression. In this exercise, we will take a set of very ugly data, parse it to find hashtags within a text field, find out how many times the hashtag was written, and what users used the hashtag.
If you would like to learn more about our RegEx Tool - check out our help on the tool. If you would like a simpler RegEx overview, check this interactive learning out.
Good luck!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I'm not sure the results provided are correct as they show a #followme result against id_str_13, but that text does not contain a hashtag, unless I'm missing something?
That aside, my solution is as follows:
#.*?\>
This splits the text by a hashtag and the end of the attached word. I used split to rows to provide a row per hashtag. I then filtered out the null rows, summarised and provided a count of the hashtags per id_str and used crosstab to pivot the data.
I then replaced nulls with zeros using the "Data Cleansing" tool and created the total.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Good catch @mceleavey and @MarqueeCrew, I have updated the start file to have the correct expected results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I used \#[a-zA-Z\d]+ to grab and tokenize into rows. Biggest difference is I am using the multi-field formula tool to rename and to return binary results for hashtag/string flags. this is in contrast to a formula and dynamic rename. As always, there's more than one way to shine a penny!
I have attached my work for a deep dive-- fun stuff!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello,
I would like to know how the expression #.*?\> works. I was also trying to solve the challenge using RegEx tool but can't seem to figure out the correct expression to use. Appreciate your feedback on this :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
In regex the .*? part simply means "take everything in between". From the dropdown you'll see a list of Regex tools that allow you to then construct things like that seen in the example.
The # here, followed by the .*? simply means take everything from the # symbol to...
I then used the \> which means "the end of the word".
So, in speech terms, it literally translates to "take everything from the # symbol to the end of the word".
Regex really isn't as scary as it first appears. I myself had no Regex experience before using Alteryx but it's really easy once you get your head around it.
Try experimenting with the dropdown list:
Feel free to message me if you have any questions or if you have a specific issue with which you're struggling.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you so much for the explanation. I will surely reach out to you guys here in the forum should I need help on anything about Alteryx. Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Also, a great reference for experimentation is regexr.com. You can paste your own test text in the text box and get real-time feedback on how your regular expression evaluates on that text.