Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #205: Taynalysis

SeanAdams
17 - Castor
17 - Castor

Wow - this took much longer than I expected - mainly due to the specifics of how we deal with different types of replacement.

 

Did this in Alteryx; and then in Python - the line counts match but other than that the answers are different than the model solution.

(i.e. my answer is wrong - don't copy my exam)

 

Spoiler
Solution.pngPython1.pngPython2.png

#SnakingMyWayThruChallenges

 

 

Sudha8990AI
8 - Asteroid

Only for album Red and 1989, some words are different. It is Quiet challenging for the beginner. But I loved it.

 

Thanks,

Sudhakar S

RobertOdera
13 - Pulsar
Spoiler
Spoiler AlertSpoiler Alert
JohanRanschaert
8 - Asteroid

in attach my solution

JeremyGonzva
8 - Asteroid

here it is

deepaprash
8 - Asteroid

My numbers don't tie exactly either. Are we counting null lines? Also the answers seem different when you use "Remove unwanted characters - Punctuation" using Data Cleanse vs using the formula as provided in the solution. Anyways, here is my answer.

leehaozhong
8 - Asteroid

Challenge 205 Solution

Spoiler
Challenge 205_LHZ.JPG
NicoleJohnson
ACE Emeritus
ACE Emeritus

House Johnson is a BIG fan of Tay Tay... this was a fun one 🙂

 

Spoiler
... and extra fun since I figured out (after MUCH GOOGLE RESEARCH) how to do it both in Alteryx and Python! #SnakingMyWayThruChallenges 

WeeklyChallenge205.JPG

Here's the Python script I came up with (not elegant by any means, but hey, it works.)

#################################
# List all non-standard packages to be imported by your
# script here (only missing packages will be installed)
from ayx import Package
import pandas as pd
import numpy as np

#Package.installPackages(['pandas','numpy'])


#################################
# Input source data, creating a dataframe for lyrics and a list for stopwords.

from ayx import Alteryx
df_lyrics = Alteryx.read("#1")

stopwords = Alteryx.read("#2")['Stop_Words'].values.tolist()

df_lyrics[:3]


#################################
# Split lyrics, replace stopwords with None values, and filter out None values.

list_lyrics = df_lyrics.lyric.str.split().tolist()
list_lyrics = [[j.lower() for j in i] for i in list_lyrics]

# Replace specific characters in split words.

for i, x in enumerate(list_lyrics):
for j, a in enumerate(x):
list_lyrics[i][j] = a.replace(',', '') \
.replace('"','') \
.replace("(","") \
.replace(")","") \
.replace(".","") \
.replace("?","") \
.replace("!","")

# Replace stopwords in the list of lyrics, and convert to Series > DataFrame.

list_replaced = list()

for i in range(len(list_lyrics)):
list_iteration = [x for x in list_lyrics[i] if x not in stopwords]
list_replaced.append(list_iteration)

df_listlyrics = pd.DataFrame(pd.Series(list_replaced))


#################################
# Join new list of lyrics to original df_lyrics.

df_join = df_lyrics.join(df_listlyrics, how='left') \
.drop(["lyric"], axis = 1) \
.rename(columns = {0:'Lyrics'})

# Join lyrics back to album information, and cleanup/pivot data so that each lyric word is in a row.

df_join = df_join.Lyrics.apply(pd.Series) \
.merge(df_join, right_index = True, left_index = True) \
.drop(['Lyrics','artist'], axis = 1) \
.melt(id_vars = ['album','track_title','track_n','line','year'], value_name = 'lyric') \
.drop(['variable'], axis = 1)

# Assign value of 1 to each lyric for counting purposes.

df_join['value'] = 1

# Perform group by and aggregation functions to derive the list of top ten words.

df_count = df_join.drop(['track_title','track_n','line'], axis = 1) \
.groupby(['year','album','lyric'], as_index = False).sum() \

df_count.sort_values(by=['year','album','value'], inplace = True, ascending = False)

df_topten = df_count.groupby(['album'], as_index = True).head(10) \
.drop(['value'], axis = 1) \
.groupby(['year','album'], as_index = False).agg({'lyric': ' '.join})


#################################
# Count total lines in album.

lyrics_count = df_lyrics.groupby(['album'])['line'].count()


#################################
# Perform group by and aggregation functions to derive the count of unique lines.

unique_lyrics = pd.DataFrame(pd.Series(list_lyrics)) \
.rename(columns = {0:'lines'}) \
.join(df_lyrics['album'])

unique_lyrics['liststring'] = [' '.join(map(str,l)) for l in unique_lyrics['lines']]

unique_lyrics = unique_lyrics.drop(['lines'], axis = 1) \
.drop_duplicates() \
.groupby(['album'])['liststring'].count()


#################################
# Join different results together.

df_final = df_topten.merge(lyrics_count, on = 'album') \
.merge(unique_lyrics, on = 'album') \
.rename(columns = {'year':'Year', 'album':'Album', 'lyric':'Top 10 Lyrics', 'line':'Total Lines', 'liststring':'Unique Lines'})

df_final['Duplicate Lines'] = df_final['Total Lines'] - df_final['Unique Lines']
df_final['Repetitiveness Percentage'] = df_final['Duplicate Lines'] / df_final['Total Lines']


#################################
Alteryx.write(df_final,1)


#################################

Cheers!

NJ

balajilolla
8 - Asteroid
Spoiler
Solution Attached

balajilolla_0-1588876501045.png

 

jarrod
ACE Emeritus
ACE Emeritus

head scratcher at first.... but after reading the comments it started to make sense. Also added a python version (probably not efficient though) - that one was tough...

Spoiler
jarrod_0-1588946737271.png

 

#SnakingMyWayThruChallenges