How to join based on closest date
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi all!
I'm having trouble joining my two datasets based on some date logic. I have one dataset with survey response data and another with visitation data. They both have the same unique respondent ID fields, however there are multiple visits and only one response date. I need to match the response data to the visit that happened directly prior to that date. In other words, I need to join the second dataset by respondent ID and the date that is closest (but before) the date in the first dataset.
Based on the similar post "How to Join on a Date Range" I thought maybe I could generate dates for the 7 days prior to the response rate and then do a standard join? I'm not sure how to generate these rows.
Thanks in advance for the help!!
Lauren
Solved! Go to Solution.
- Labels:
- Date Time
- Join
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Lauren,
Would you be able to post some example data and I would be happy to put together a workflow for you.
Ben Carley - Keyrus UK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
For sure!
Attached is a sample of each dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Lauren,
I am having trouble accessing the gallery to upload an example workflow but I have approached the task in a slightly different way. I have joined the two data sources on client ID with no other field. This means you will match every item to every other item and end up duplicating your data.
Once you have done this you will be able to filter all the results where the response data is before your visitation date (as I am assuming that you do not want responses on visits that haven't happened)
Once you have filtered out the invalid rows you want to pick just one of the rows for each pair, so sort by customer ID, Response Date, Visitation Date and the use the sample tool to pick the first row as grouped by Visitation ID and Customer ID.
By using the combination of Sort and Sample you should pick only the most recent response for each visitation. Does that make sense or am I misunderstanding your data? (or making a mistake!)
-Ben
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I think for a visit that is on the same day I need to combine the other metrics (dwell time and number of stores visited) so there is only one visit per day.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Great, you could put all of your visit results through a summarize and use a sum where you group on visitation date. I think that should work for you.
