community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
#SANTALYTICS

The highly anticipated Alteryx Community tradition is back! We hope you'll join us!

Learn More

Fuzzy match two sources with different columns

Meteoroid

Hi there,

 

I've been trying - with absolutely no success until now... - to use the Fuzzy Match tool to join two data sources that contain different types of data. They both have a date column and a name column that should be able to be fuzzy matched in most cases - I think. 

 

I'm working with marketing attribution in the music business (we can't actually track marketing conversions to Spotify for instance, so we have to figure out our own attribution models). 

 

Data source 1 contains marketing data with three columns: Campaign name, date and daily spend. 

Data source 2 contains streaming data with three columns: Track(Concatenated [track_artist]," - ",[track_title]), date and daily streams.

Everything should ideally be joined by name ("Campaign name" for source 1, "Track" for source 2) and date. Because the campaigns are setup manually, the formatting differs between where they were set up and who set them up, and they can be subject to spelling errors as well. That's why I can't just join on exact matches. But most of them will contain the artist and track title somewhere. 

 

What I want is to join the data in a way so that each campaign for each track is matched to the specific track, showing how much was spent each day compared to how many streams the track had on the specific day. Then, I would be able to calculate the correlation coefficient to find out if the marketing activities done on a specific track made an impact on the amount of streams. 

 

Examples:

Data source 1:

Campaign nameDateSpend
ABBA - Waterloo - Facebook2018-09-01500
ABBA - Waterloo - Facebook2018-09-02500
Abba waterloo youtube ads2018-09-021000
Abba waterloo youtube ads2018-09-031500
ABBA - waterlo tv campaign2018-09-0410000
ABBA - Waterloo - Facebook2018-09-08600

 

Data source 2:

TrackDateStreams
Abba - Waterloo2018-09-0145000
Abba - Waterloo2018-09-0260000
Abba - Waterloo2018-09-0355000
Abba - Waterloo2018-09-0475000
Abba - Waterloo2018-09-0560000
Abba - Waterloo2018-09-0654000
Abba - Waterloo2018-09-0750000
Abba - Waterloo2018-09-0853000

 

The goal is to have the output look like this:

TrackDateSpendStreams
Abba - Waterloo2018-09-0150045000
Abba - Waterloo2018-09-02150060000
Abba - Waterloo2018-09-03150055000
Abba - Waterloo2018-09-041000075000
Abba - Waterloo2018-09-05060000
Abba - Waterloo2018-09-06054000
Abba - Waterloo2018-09-07050000
Abba - Waterloo2018-09-0860053000

 

As you can see, the spend should be summarized if there was more than one campaign for a certain date, and if there weren't any marketing activities for a certain date, I still need the row of that date, just with "0" in the spend column. 

 

The above example is for one track (with several campaigns).  Ideally I want data for the past two weeks, which means something like 20 different tracks with maybe 30-40 different campaigns in total. In the streaming database, there will probably be more than 1000 tracks that we didn't do any marketing for (so there won't be any campaigns for them to be matched to) - they should be omitted. 

 

Hope this all makes sense and that there is an expert out there that can help.

Big thanks in advance!

Alteryx Certified Partner
Alteryx Certified Partner

@andreasledet,

 

This is a case where data exploration can take you on an interesting journey.

 

Picture1.png

 

I would explore the campaign names and determine which words are important for matching by exclusion of other words.  So words like Facebook or 'youtube ads' would be isolated from the campaign.  What would remain would be the data that you want to match to the track information in data source 2.  Using either a FIND REPLACE tool or by use of equal hygiene to the Campaign and Track fields (e.g. Uppercase, remove punctuation).  We would find matches.

 

Once you go through the discovery and "equalize" the data, I think that you'll find that FUZZY MATCH wasn't the best tool for the job.  My friend @chris_love may agree, but since he's an avid Justin Bieber fan he and I do disagree from time to time.

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Labels