Custom Filtering yxdb file before importing
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello,
I have a very large yxdb file that takes a long time to import.
Is there a way to apply filter when importing this file? I know when I connect input with database directly (i.e. Teradata), I can apply custom SQL for filtering.
Is there something similar when I use yxdb file as input?
Or any tips on improving import performance with large file?
- Labels:
- Topic of Interest
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
No, the best you can do is select just the first X number of rows (but I'm guessing you knew that). For optimization, you can create the YXDB in AMP mode (Alteryx Multithreaded Processing) and then read it into a module that's also set to AMP mode. That actually exponentially increases the read time. Other than that, just the usual - use an autofield tool right before the YXDB is created to make the fields as compressed as possible.
One other option that I've had good results with is instead of YXDB, you can output it as a SQLite table. Even if you don't have SQLite on your machine, it doesn't matter - you're machine won't recognize it if you tried to "open" it, but Alteryx can create it and read it and it works like a traditional SQL database.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Look into using a Calgary database (.cydb) instead of an Alteryx database file (.yxdb). See the Calgary toolset.
https://help.alteryx.com/designer/calgary
A Calgary database is a lot like a .yxdb, except you can put indexes on it. It will still be a flat file in a network folder, but there will also be a bunch of other files related to the indexes you build.
Put an index on whatever it is you're trying to filter, then use the Calgary Input tool to load it. It will use the index and only load the rows you need. If you're looking for a specific row, it's possible to look up a value in an instant out of a Calgary database with millions of records.
If the thing you're filtering needs to be a little more dynamic than the hard-coded filters you would put into the "query" of the Calgary Input tool, then look into using a Calgary Join which acts like an input but streams the fields you'll connect on from your data to join to the Calgary tool. Again, it will work very fast if the indexes you built are correct
