Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #205: Taynalysis

nmacpherson
6 - Meteoroid
Spoiler
nmacpherson_0-1583179281972.png

 

patrick_digan
17 - Castor
17 - Castor
Spoiler
patrick_digan_0-1583179922995.png

 

TonyA
Alteryx Alumni (Retired)

Getting some slight differences. I checked some of the other solutions and they seem to be seeing the same discrepancies.

 

 

Kenda
16 - Nebula
16 - Nebula
Spoiler
Capture.PNG

cgoodman3
14 - Magnetar
14 - Magnetar

Slight differences in my answer to the provided example

 

Spoiler
Challenge 205.PNG
Chris
Check out my collaboration with fellow ACE Joshua Burkhow at AlterTricks.com
aanandkumar
8 - Asteroid

Here is my solution. I couldn't figure out how the numbers were calculated so some of the numbers are off. 

mbogusz
9 - Comet
Spoiler
2020-03-02 20_00_02-Greenshot.pngSome slight differences in expected vs. actualSome slight differences in expected vs. actual
sambitd
6 - Meteoroid

Hi All ,

 

This is my very first weekly challenge response 🙂

I am excited to share the news that my paper "Workday Data Migration : How we saved over 2000 hours of manual effort" was chosen for the Excellence Award !!!!

 

For this weekly challenge , I used the summarise function and the count function on the lyric field to return counts , count distinct of lines per album. Using the data I arrived at the duplicate records. The data matched for some records but was off by 1 number for a few. Attached is my workflow.

 

Regards

Sambit

 

 

cam_w
11 - Bolide
Spoiler
#################################
# List all non-standard packages to be imported by your 
# script here (only missing packages will be installed)
from ayx import Package
#Package.installPackages(['pandas','numpy'])


#################################
from ayx import Alteryx
from collections import Counter 

import re

dfExpected = Alteryx.read("#Output")
dfLyrics = Alteryx.read("#Lyrics")
dfStopwords = Alteryx.read("#Stopwords")


#################################
# Create a simple list of stopwords
stopwords = [w[0] for w in dfStopwords.values.tolist()]


#################################
dfTop10 = dfLyrics.groupby(['year','album'])['lyric'].apply(" ".join).reset_index()

def get_top_10_words(word_list):
    word_list = re.sub(r"[^a-zA-Z0-9\s\']", r'', word_list)
    list_ = word_list.split()
    not_stop = [word for word in list_ if word.lower() not in stopwords]
    counter = Counter(not_stop)
    return " ".join([word for (word, count) in counter.most_common(10)])

dfTop10['lyric'] = dfTop10['lyric'].apply(get_top_10_words)


#################################
dfLines = dfLyrics.groupby(['year','album'])['lyric'].agg(['nunique','count']).reset_index()

dfLines['dups'] = dfLines['count'] - dfLines['nunique']
dfLines['percent'] = dfLines['dups'] * 100 / dfLines['count']

dfLines = dfLines[['year', 'album', 'nunique', 'dups', 'count', 'percent']]


#################################
dfOutput = dfTop10.merge(dfLines).rename(columns=
                                         {"year": "Album Year",
                                          "album": "Album Name",
                                          "lyric": "Top_10_Lyrics",
                                          "nunique": "Unique_Lines_Per_Album",
                                          "dups": "Duplicate_Lines_Per_Album",
                                          "count": "Total_Lines_Per_Album",
                                          "percent": "Repetativeness_Percentage"
                                         }
                                        )


#################################
Alteryx.write(dfOutput, 1)

I wanted to practice my data frames with this one, so I used the python tool. Like others have mentioned, my results are very close to the expected output counts.

 

On a whim I also tried training an RNN (not attached) to generate new T-Swift songs, but after 30 epochs it was over-fitting. Reducing the number of epochs produced incoherent lyrics. At approximately 33k words, there wasn't enough data to satisfy the network. We'll have to wait for more Taylor albums! 🙂

chris_ramsay_dup_425
8 - Asteroid

Thanks for the challenge! Here's my solution