Verify source data exists in the target table in ORACLE before load using Alteryx workflow
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am using Alteryx Designer 2022.3 on Window 10 and need to verify whether source data exists in an ORACLE target table before loading
- The source table ( e.g. SS_PROD_202503) is dynamic and contains 10 million records.
- The target table ( e.g. SS_PROD) is static and has trillions of records.
- Both table shares common column(YYYYMM)
I initially used the join tool in Alteryx, but it was too slow. What is the best high performance method to efficiently verify source data existence in the target table before loading ?
- Labels:
- Developer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
dude. are you using in-db tools? first off - let's work under the assumption that you have some kind of primary key instead of relying upon magic. next you'd use IN-DB for both databases. then you'd use like a left outer join. you'd identify which records didn't exist in your target table -> and then union them. You'd then write the output. All IN-DB.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
It is faster to use Distinct (or Unique tool) with YYYYMM and then join.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @ntakeda that doesn't really work in-db... You can use a distinct in your sql query - but it's performatively basically the same as group by. Unique isn't a good tool. It's not Data Cleanse level bad - but bad.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @apathetichell thank you for reply in my workflow I am not using In-Database tool
My current workflow follows the logic below
In source table used dynamic Input to get dynamic table
In target table used Input data have query with Distinct.
Using Join tool to check If J output load message 'Data Already exists'
If L or R out put count records for source and target validate source records within acceptable range (+/- 5% range)
At last if records within range load data into target table if record count outside the range data is still loaded but with an alert message.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
If your issue is speed/performance - use In-DB. Input data/dynamic input require pulling the outputs of your SQL query into memory. In-DB uses push down querying and only retrieves results via datastream in/out.
