My sample data looks like:
article | Qty | Date |
A | 3 | 2019-10-11 |
A | 5 | 2019-10-08 |
A | 1 | 2019-10-05 |
A | 2 | 2019-09-15 |
A | 1 | 2019-09-09 |
A | 1 | 2019-09-01 |
B | 3 | 2019-10-11 |
B | 2 | 2019-10-08 |
B | 3 | 2019-10-05 |
B | 1 | 2019-09-15 |
B | 4 | 2019-09-09 |
C | 1 | 2019-10-11 |
C | 2 | 2019-10-08 |
C | 1 | 2019-10-05 |
C | 1 | 2019-09-15 |
C | 0 | 2019-09-09 |
C | 4 | 2019-09-01 |
C | 1 | 2019-08-28 |
and I want following output:
Article | date | Qty | Sum_Qty_28_days |
A | 2019-09-01 | 1 | 1 |
A | 2019-09-09 | 1 | 2 |
A | 2019-09-15 | 2 | 4 |
A | 2019-10-05 | 1 | 4 |
A | 2019-10-08 | 5 | 8 |
A | 2019-10-11 | 3 | 11 |
B | 2019-09-09 | 4 | 4 |
B | 2019-09-15 | 1 | 5 |
B | 2019-10-05 | 3 | 8 |
B | 2019-10-08 | 2 | 6 |
B | 2019-10-11 | 3 | 9 |
C | 2019-08-28 | 1 | 1 |
C | 2019-09-01 | 4 | 5 |
C | 2019-09-09 | 0 | 5 |
C | 2019-09-15 | 1 | 6 |
C | 2019-10-05 | 1 | 2 |
C | 2019-10-08 | 2 | 4 |
C | 2019-10-11 | 1 | 5 |
The logic is: Column “Sum_Qty_28_days “ should sum up the values of “Qty ” column for same article which falls under last 28 days E.g. “Sum_Qty_28_days” value for “article” = ‘A’ and date = ‘’2019-10-8” is 8 because it is summing up the “Qty” values associated with dates (coming within previous 28 days) Which are: 2019-09-15, 2019-10-05, 2019-10-08 for “article” = ‘A’.
Is this possible using SQL window function? I tried myself with following code:
SUM("Qty") OVER (PARTITION BY "article", date_trunc('month',"Date") ORDER BY "Date")
But, it is far from what I need. It is summing up the Qty for dates falling in same month. However, I need to sum of Qty for last 28 days.
Note: I already have solution using non In-database tools, but that it taking huge amount of execution time for whole dataset. I request for a solution with In-database tools only or using window function.
Thanks.
Solved! Go to Solution.
Hi @PKN I don't have access to a database at the moment, but I've made a workflow schema which should do what you want
Hope that helps,
Ollie
Hi @OllieClarke ,
Thanks for the solution. Unfortunately, Filter-In tool used gives an error "Error: Filter In-DB (3): Error SQLPrepare: [EXASOL][EXASolution driver]identifier Tool14_e791.article is ambiguous. (Session: 1652459517215047596)
" when I am running the workflow.
Ragards,
PKN
@PKN hmmm, what if you have 2 connections (both connecting to the same table) and alias them?
@PKN I found a database.Looks like this approach works
Yes, this works. Thanks a lot @OllieClarke
Just one doubt. I have millions of records in original table and when I am doing inner join on just article then the join operation creates billion records. Where these records are processed/stored during/after join? Like on the Alteryx server or on the database memory from where I am reading/storing the records, or somewhere else? Right now the workflow is running fine, I am just worried that if in future the number of records will increase whether It can cause error due to memory issues during workflow execution?
Thanks
Glad to hear it @PKN. If you're only using in-db tools, then everything will happen in your database. The in-db tools are really a way to construct SQL queries using the familiar alteryx interface, unless you have a browse tool, or an output of some kind they won't bring any data into your memory (I think I'm right in saying that they won't even execute the created query without a browse or output). There's more info on the in-db tools here. In terms of scaling, you should be safe from memory limitations (unless you're streaming out the data into memory) - the limitations, if they do exist, would be on the database side.
Hope that helps,
Ollie
Thanks @OllieClarke. Thanks a lot for all the help.