Append IF [B_Date] Between DateTimeDiff([A_Date],-11,'month') and [A_Date]?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
i've built a workflow where i need to join two tables, but the join is unusual.
table A has a list of dates, and i need to join table B onto it anywhere the date on table B is somewhere within the past 11 months of the date on each row of table A
i solved it by using the Append tool, and then following it with a filter tool to remove rows where the B_Date falls outside of the range i'm looking for
after i finish the append, i can then summarise the data the way i need it
i can get away with this because my tables are fairly small.. but it's still waaay bigger than it should be.. the append tool warns you if you go over 16 rows.. i'm appending 623,000 rows onto 200 rows... (meaning i wind up with 124,000,000 rows).. that's a manageable number.. But what if i needed to append 623,000 rows onto 44 million rows?
This solution works this time, but there has to be a more computationally efficient way to do this... right?
Solved! Go to Solution.
- Labels:
- Join
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@Matthew
Maybe we can narrow it down to Year first?
A sample set of data will be better.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
I agree that posting some test data and workflow itself would be helpful.
Have you tried filtering the data before joining? eg. filter out the dates in B that are earlier than min date from A
regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@JarekSkudrzyk
that's pretty much exactly what i'm struggling with. i can't think of a way to properly filter it.. i need to keep (and join) all rows in B that are within the past 11 months of the date in A, but every row in A has a different date
i have managed to reduce the size of the tables by grouping by the month instead of the date, but i think that's as far as i can reduce it.
so now we're looking at 200 rows x 42,000 rows = 8,400,000 rows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks for the idea, ive grouped the inputs by month. That has brought this down to 8 million instead of 124 million, but i still think this is an inefficient method. it'll only work with small data sets
Ive attached a snippet of my workflow with scrubbed input data
Ultimately the goal is to sum two values from B onto A, but i only know which values on table B to sum by first joining the tables together..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I was able to reduce the time of appending fields from 1972ms to around 324ms by applying preliminary filter to set B before appending to set A.
By preliminary I mean: filtering out the dates that are not within the following range:
minimum month of set A for the given market/series minus 11 months
maximum month of set A for the given market/series
I used batch macro to achieve that - see attached workflow.
Hope this is helpful for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Btw when solving your case I have learned that large text input tools tend to crash my Alteryx (when opening configuration window for them).
You may want to consider using e.g. yxmd input files and exporting the workflow together with input files into yxzp package.
regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@Matthew
I agree with @JarekSkudrzyk , the large text input tends to freeze, if not crashing.
I have replaced the Appending Tool with theJoin tool, which will not enlarge the data stream.
so the running time is reduced from 2.6s to 0.5s in my PC.
Hope this result is acceptable for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@Qiu I'm impressed, that is a much more logical solution. thanks for the help, i'll definitely go with this solution!
Alteryx is a very useful program, but sometimes it's difficult to rewire the way you think about specific problems.
@JarekSkudrzyk i agree, i shouldn't have used the text input in my example, that was my bad.. the actual inputs are coming from a database stream
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Qiu - Brilliant idea, congrats!