community
cancel
Showing results for 
Search instead for 
Did you mean: 
Do you have the skills to make it to the top? Subscribe to our weekly challenges. Try your best to solve the problem, share your solution, and see how others tackled the same problem. We share our answer too.
Weekly Challenge
Do you have the skills to make it to the top? Subscribe to our weekly challenges. Try your best to solve the problem, share your solution, and see how others tackled the same problem. We share our answer too.
Unable to display your progress at this time. Please try again a little later, or contact an administrator if you continue to see this error.
Getting started with Designer? | Start your journey with our new Learning Path!

Challenge #56: Parsing and Counting Hashtags

Highlighted
Director, Customer Enablement
Director, Customer Enablement

 

Last week's solution can be found HERE.

 

 

This week's challenge, by its title, sounds like a simple task. However, a quick look at the data will show that it's not going to be just a summarize tool. If you have been meaning to finally learn the RegEx tool, this exercise will be a great first foray with a relatively simple expression. In this exercise, we will take a set of very ugly data, parse it to find hashtags within a text field, find out how many times the hashtag was written, and what users used the hashtag.

 

If you would like to learn more about our RegEx Tool - check out our help on the tool. If you would like a simpler RegEx overview, check this interactive learning out.

 

Good luck!

 

 

 

 

 

Alteryx Certified Partner

I'm not sure the results provided are correct as they show a #followme result against id_str_13, but that text does not contain a hashtag, unless I'm missing something?

 

That aside, my solution is as follows:

 

Spoiler
I first created a RecordID field, then used Regex to parse the text field using the following:

#.*?\>

This splits the text by a hashtag and the end of the attached word. I used split to rows to provide a row per hashtag. I then filtered out the null rows, summarised and provided a count of the hashtags per id_str and used crosstab to pivot the data.
I then replaced nulls with zeros using the "Data Cleansing" tool and created the total.

Solution.PNG


results.PNG


Alteryx Certified Partner
Alteryx Certified Partner

Of course I wanted to solve this without the use of RegEx (just being difficult).  I don't agree that id_str_13 should have any counts.

 

Spoiler
Screen Shot 2017-02-06 at 12.18.02 PM.png
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Director, Customer Enablement
Director, Customer Enablement

Good catch @mceleavey and @MarqueeCrew, I have updated the start file to have the correct expected results.

Meteor

Nice challenge, have a great week guys!

 

Spoiler
Capture.PNG

 

 

Alteryx Certified Partner

 

 

Spoiler
CompletedWorkbook.PNG

 

 


I used \#[a-zA-Z\d]+ to grab and tokenize into rows. Biggest difference is I am using the multi-field formula tool to rename and to return binary results for hashtag/string flags. this is in contrast to a formula and dynamic rename. As always, there's more than one way to shine a penny!

I have attached my work for a deep dive-- fun stuff! 

 

Hello,

 

I would like to know how the expression #.*?\> works. I was also trying to solve the challenge using RegEx tool but can't seem to figure out the correct expression to use. Appreciate your feedback on this :-)

Alteryx Certified Partner

In regex the .*? part simply means "take everything in between". From the dropdown you'll see a list of Regex tools that allow you to then construct things like that seen in the example.

The # here, followed by the .*? simply means take everything from the # symbol to...

I then used the \> which means "the end of the word".

So, in speech terms, it literally translates to "take everything from the # symbol to the end of the word".

 

Regex really isn't as scary as it first appears. I myself had no Regex experience before using Alteryx but it's really easy once you get your head around it.

 

Try experimenting with the dropdown list:

 

screenshot.PNG

 

Feel free to message me if you have any questions or if you have a specific issue with which you're struggling.

 

Thank you so much for the explanation. I will surely reach out to you guys here in the forum should I need help on anything about Alteryx. Thanks

Alteryx Certified Partner

Also, a great reference for experimentation is regexr.com. You can paste your own test text in the text box and get real-time feedback on how your regular expression evaluates on that text.