This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
The highly anticipated Alteryx Community tradition is back! We hope you'll join us!
Learn MoreHi there,
I've been trying - with absolutely no success until now... - to use the Fuzzy Match tool to join two data sources that contain different types of data. They both have a date column and a name column that should be able to be fuzzy matched in most cases - I think.
I'm working with marketing attribution in the music business (we can't actually track marketing conversions to Spotify for instance, so we have to figure out our own attribution models).
Data source 1 contains marketing data with three columns: Campaign name, date and daily spend.
Data source 2 contains streaming data with three columns: Track(Concatenated [track_artist]," - ",[track_title]), date and daily streams.
Everything should ideally be joined by name ("Campaign name" for source 1, "Track" for source 2) and date. Because the campaigns are setup manually, the formatting differs between where they were set up and who set them up, and they can be subject to spelling errors as well. That's why I can't just join on exact matches. But most of them will contain the artist and track title somewhere.
What I want is to join the data in a way so that each campaign for each track is matched to the specific track, showing how much was spent each day compared to how many streams the track had on the specific day. Then, I would be able to calculate the correlation coefficient to find out if the marketing activities done on a specific track made an impact on the amount of streams.
Examples:
Data source 1:
Campaign name | Date | Spend |
ABBA - Waterloo - Facebook | 2018-09-01 | 500 |
ABBA - Waterloo - Facebook | 2018-09-02 | 500 |
Abba waterloo youtube ads | 2018-09-02 | 1000 |
Abba waterloo youtube ads | 2018-09-03 | 1500 |
ABBA - waterlo tv campaign | 2018-09-04 | 10000 |
ABBA - Waterloo - Facebook | 2018-09-08 | 600 |
Data source 2:
Track | Date | Streams |
Abba - Waterloo | 2018-09-01 | 45000 |
Abba - Waterloo | 2018-09-02 | 60000 |
Abba - Waterloo | 2018-09-03 | 55000 |
Abba - Waterloo | 2018-09-04 | 75000 |
Abba - Waterloo | 2018-09-05 | 60000 |
Abba - Waterloo | 2018-09-06 | 54000 |
Abba - Waterloo | 2018-09-07 | 50000 |
Abba - Waterloo | 2018-09-08 | 53000 |
The goal is to have the output look like this:
Track | Date | Spend | Streams |
Abba - Waterloo | 2018-09-01 | 500 | 45000 |
Abba - Waterloo | 2018-09-02 | 1500 | 60000 |
Abba - Waterloo | 2018-09-03 | 1500 | 55000 |
Abba - Waterloo | 2018-09-04 | 10000 | 75000 |
Abba - Waterloo | 2018-09-05 | 0 | 60000 |
Abba - Waterloo | 2018-09-06 | 0 | 54000 |
Abba - Waterloo | 2018-09-07 | 0 | 50000 |
Abba - Waterloo | 2018-09-08 | 600 | 53000 |
As you can see, the spend should be summarized if there was more than one campaign for a certain date, and if there weren't any marketing activities for a certain date, I still need the row of that date, just with "0" in the spend column.
The above example is for one track (with several campaigns). Ideally I want data for the past two weeks, which means something like 20 different tracks with maybe 30-40 different campaigns in total. In the streaming database, there will probably be more than 1000 tracks that we didn't do any marketing for (so there won't be any campaigns for them to be matched to) - they should be omitted.
Hope this all makes sense and that there is an expert out there that can help.
Big thanks in advance!
This is a case where data exploration can take you on an interesting journey.
I would explore the campaign names and determine which words are important for matching by exclusion of other words. So words like Facebook or 'youtube ads' would be isolated from the campaign. What would remain would be the data that you want to match to the track information in data source 2. Using either a FIND REPLACE tool or by use of equal hygiene to the Campaign and Track fields (e.g. Uppercase, remove punctuation). We would find matches.
Once you go through the discovery and "equalize" the data, I think that you'll find that FUZZY MATCH wasn't the best tool for the job. My friend @chris_love may agree, but since he's an avid Justin Bieber fan he and I do disagree from time to time.
Cheers,
Mark