Hello folks,
I am trying to figure out is this possible to do with Alteryx or not?
So, this is the case: I have two tables, their JOIN based on the "id" column and let's say that I have 5 records after this JOIN.
Next step is to bring some additional data/values based on the same "id" from another table that has more than 20 million records (my example has 20 records).
Now, I do not want to load/import all 20 million of records from this table, I would like to load only (5 records) that I got from the first JOIN.
Therefore, is it possible to use these 5 records as some kind of WHERE clause while importing the data from this huge table?
All tables are on Teradata, but I am attaching an example with the excel files.
Thank you in advance.
Can you write a single SELECT statement to pull the data from Teradata?
Something like (pseudo-SQL-code):
SELECT field3 from Table3 where ID in
( SELECT Table1.field1, Table2.field2 from Table1 left join Table2 on FieldC )
For more complicated SQL, I've used the inDB tools to upload a data stream to a Temp table, and used the Temp table in subsequent SQL. Required a new permission to create Temp tables, from the DBA.
Chris
Hi @cito ,
you haven't included any data in your workflow, but the answer is yes. You could either use in-db functionality where you can apply the joins within the in-db section, which effectively creates a WHERE clause in teh join without having to script it.
Alternatively, you can build the clause, wrap it in a macro and pass through the joined values to overwrite the part of the script that lists the variables of the WHERE clause. You would then pass these values into the macro so they would be dynamic.
M.
@mceleavey could you please share an image of the flow how that would look like. I haven't used IN-DB functionality so far.
Thanks.
Could someone help me with the IN-DB and how to do this, since I am not familiar with the IN-DB?
Thanks.
So, I started playing with the IN-DB connections, but can't figure out what is the problem here.
I've tried both options - first (SQL data) as an "Data Stream IN" and then as "Connect IN-DB".
Do I need to have both data sets to come from the same source (Teradata) or I can MIX them (SQL base and Teradata - like in my case)?
As far as I know, you'll need to upload your 5 records to a Temp table in the same database.
I've used the inDB tools to upload a data stream to a Temp table, and used the Temp table in subsequent SQL. Required a new permission to create Temp tables, from the DBA.
I resolved the connection issue, but now the RIGHT JOIN still imports the whole data set (~20.000 million records instead of 300k).
Now, this is the situation where I would like to say: import from the RIGHT JOIN only the serial numbers that I have in LEFT JOIN, but this is not happening.
If you have a small list of Ids, you can use an IN clause in your SQL.
Initially define the Query using something like a Text Input tool. See example below.
Use data prep tools to replace the #Placeholder# text in the query with your list of ids.
Then pass in your Query and Connection to the Dynamic Input in-DB tool.
Chris