Hello,
there is something strange about a workflow (workflow pic in attachment).
Actually this workflow fail due the big amount of data, below the two SQL statement for each object.
1) select * from bigdata_snow.current_rde_incident
2) select * from bigdata_snow.current_rde_task_sla where assigned_group_oe_id_3 ='MBO'
When I change both SQL statements with more conditions, I receive an error in the filter object, but it look like it is related to the SQL statements, as no other modifications have been done (second screenshot).
Below the new SQL statement:
1) select * from bigdata_snow.current_rde_incident where assigned_grp_oe_id_n3 ='MBO' and sys_created_on >= '2022-01-01 00:00:00'
2) select * from bigdata_snow.current_rde_task_sla where assigned_group_oe_id_3 ='MBO' and sys_created_on >= '2022-01-01 00:00:00'
the same SQL statement work without problem when I use the input data tool.
someone have any idea why and who to solve it?
thanks in advance.
Error message: Error: Filter In-DB (5): Error SQLPrepare: [Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] : Runtime Error: Query 9749773b3f515a84:a66ce6b700000000: 0% Complete (0 out of 103)
Memory limit exceeded: Failed to allocate row batch
EXCHANGE_NODE (id=4) could not allocate 32.00 KB without exceeding limit.
Error occurred on backend uslp2686185.am.hedani.net:22000
Memory left in process limit: 244.47 GB
Memory left in query limit: 24.40 KB
Query(9749773b3f515a84:a66ce6b700000000): Limit=725.39 MB Reservation=576.48 MB ReservationLimit=580.31 MB OtherMemory=148.89 MB Total=725.36 MB Peak=725.36 MB
Fragment 9749773b3f515a84:a66ce6b70000001c: Reservation=542.48 MB OtherMemory=135.02 MB Total=677.50 MB Peak=677.50 MB
HDFS_SCAN_NODE (id=1): Reservation=542.48 MB OtherMemory=108.95 MB Total=651.42 MB Peak=651.42 MB
Exprs: Total=12.00 KB Peak=12.00 KB
Queued Batches: Total=60.99 MB Peak=60.99 MB
KrpcDataStreamSender (dst_id=3): Total=2.67 MB Peak=3.27 MB
Fragment 9749773b3f515a84:a66ce6b70000007a: Reservation=
Use smaller data sets.
or drop unwanted rows and columns
Thank you for your reply, however when I modify the SQL statement I still face the same issue.....
for the first SQL I have set 4 different fields instead of *, and still same issue.
I have encountered something similar. From my experience there are a few things to think about from the perspective of exceeding memory limit:
1) In-DB tools that have multiple data streams pulling from one Connect Tool are more likely to hit a memory error. Consider increasing the number of connect tools or decreasing the amount of historic data you are pulling in.
2) More data = more likely to error. Think about how far back in time your data goes in addition to the number of fields.
3) Joins and Unions can be causing the problem, look at those carefully.
4) You can hit a memory limit error when Creating a new large table that you will NOT hit when Appending to that same table. Consider creating the table with a smaller amount of data and then appending to that.
Good luck!
S-Bon
Thank you Stella.
I think you are right about the size of the tables... that why I have tried to change the SQL statements putting more conditions.
however seems my change does not give me the outcome.
for me now the big question is: in the in-database tool the SQL why does not work as the input tool.
what is wrong in this 2 SQL statement for the in-database tool?
1) select * from bigdata_snow.current_rde_incident where assigned_grp_oe_id_n3 ='MBO' and sys_created_on >= '2022-01-01 00:00:00'
2) select * from bigdata_snow.current_rde_task_sla where assigned_group_oe_id_3 ='MBO' and sys_created_on >= '2022-01-01 00:00:00'
if I will fix this issue, the problem of the memory will disappear.... this is the key of the problem.
I can't see anything wrong there. But when you put a SQL statement in the Connect tool, that in itself might be the issue. Can you create a new table in your database using that SQL statement? And then pull from that? Another thing to think about is the Type. Are you asking Alteryx to pull in LOB data on column assigned_grp_oe_id_n3? These tools are quirky. You could also talk to a database admin to get their opinion.
Also, two more thoughts:
1)If you are pulling from these two tables at the same time, consider doing them separately. You could use a macro to schedule the second one to run after the first one has finished.
2) Are you writing to the same tables that you are pulling from? If yes, 7know that Alteryx begins prepping the write-out while your initial data is still being pulled in. This can cause an error as the load on the table is doubled. Make sure the write out is done in a separate step that does not start until the first step (Pulling the data in) completes.
Actually I have recreated the workflow with the IN/OUT tools, same logic, same conditions, same.... and it work with those SQL statements.
I wonder if the problem is about the in-database tools.... because I don't see any other explanations. screenshot in attachment with the new workflow.
Yes, it is more likely the way the tools act on the database, not the construction of your sql statement.
I can tell because the errors reference memory.
Pretend you are the database and think about what you are being asked to do and in what order. Something is overloading the working memory and by adjusting the tools or the timing or both you can find a work-around.
(Sorry I can't open your image because I'm on my work computer)
Best of luck!
S-Bon
A little optimization : you should put your filter BEFORE the join if the two tables are heavy.