Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

In-DB: How to split data in batches to reduce the high DB temp space consumption?

gelvira
7 - Meteor

Hello! 

 

I am new to Alteryx and any help/suggestion is appreciated.

 

Problem: 

I need to processing billions of data points for my analysis. If I take all data together then Alteryx runs for hours. There is no way I can pull this data outside db. 

I am doing everything in-DB and using summarize to get more insights. 

 

Is there a way to split the initial (Billions of record) into smaller batches in Alteryx without pulling it outside DB?  

Would this reduce my temp DB consumption? 

 

Thanks!

4 REPLIES 4
Sebastiaandb
12 - Quasar

Hi @gelvira ,

 

Are you using the in-database tool palette In-Database | Alteryx Help

Use the Connect-indb to make a connection to your database. It's way more efficient as it doesn't pull the records to your local instance of designer. 

You can use the query in the connect-indb tool or filter-in-db tool to take a subset of date before transferring the data out with the data stream out tool (converting it to a normal workflow/bringing it in locally).

 

Sebastiaandb_0-1645105418951.png

 

Hope this helps!

 

Greetings,

 

Seb

gelvira
7 - Meteor

Thank you for your response, Seb.  

 

Yes, I am only using the in-database tool palette  In-Database | Alteryx Help

Also, connect-indb is used to make a connection. I have filtered down the data as much as I can in the initial query itself which is - "1 day's data" for testing. 

Yet the data is huge to pull it outside-db. So I am using  In-Database | Alteryx Help  formulas, filters and summarizes. 

 

I need a way to split this 1 day's data into batches ( for instance divide the data into 4 equal parts) and I can use containers to run one specific batch with the same workflow. I tried to create a function to do the split but nothing worked for me. 

apathetichell
20 - Arcturus

what kind of database are you using btw?

Which one are you trying to do:

1) Are you trying to process billions of instances In-DB?

2) Are you trying to bring billions of records out of In-DB and into the standard canvas?

 

responses:

1): This should be fine. No issue. You do your processing In-DB and get down a few hundred thousand or whatever and use datastream out or write In-DB as needed.  You do all of your processing before datastream out.

2) Don't do this. Go back to 1. Alteryx can handle tens of millions of records. Maybe hundreds of millions. Maybe billions - depending upon your system memory - but getting that data from your DB source into in-memory computation will be bad.

 

 

 

apathetichell
20 - Arcturus

can you split by adding:

 

something like:

SELECT ROW_NUMBER() OVER(ORDER BY - use your date/time or other field here) AS RecId,
RecID/COUNT(ROW_NUMBER) AS Bucket

 

Labels
Top Solution Authors