Hello,
I am currently developing a workflow where I would like to enrich one dataset (Dataset #1) with information from another dataset (Dateset #2) as part of the "Input Data" step, in order to filter down the query results. Dataset #1 and Dataset #2 are in two different databases.
Dataset #1 is transactional payment data that is extremely large in size, and includes data for ALL projects regardless of their "status". As part of the "Data Input" step I would like to setup up my database query to return results only for projects with certain statuses. Dataset #1 unfortunately does not have a "status" field for me to filter on. The status field is in Dataset #2, which is located in a different database. Is it possible to connect to the two different databases in one "Data Input" tool, so that I can relate Dataset #1 to Dataset #2, pull in the "status" field from Database #2 , and then set my query parameters to the desired "status"?
Alternatively, is it possible to query dataset #2 to return only projects with a desired "status", and then use the results of this query to filter my query for dataset #1 so that it includes only information for the projects included in dataset #2?
Essentially I am just trying to avoid pulling a massive amount of data (from dataset #1) when I only want a much smaller subset of the data.
Hi @ekurutz
Check out the answer to this post. It's very similar to yours in that the the op needs to filter the results from one table based on a query returned from another. The IN technique posted by @bbak91 is probably most suited.
The other alternative is to use the Stream In-DB tool to push the results from the smaller Db to the larger one. To use this, you need permissions to create temporary tables in the larger db.
Dan
User | Count |
---|---|
52 | |
27 | |
25 | |
24 | |
21 |