Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Run SQL queries on csv-files - ideas?

BYJE
7 - Meteor

I've seen that this question has been asked before. Here's @danilang's reply from last year:

"The Input and Dynamic Input tools don't support SQL syntax when connecting to .xlsx files using the built in .xlsx driver. You're limited to Sheet Name(or Named Range), Start Data import on Line X and row Limit.  They import the entire range of data defined by these parameters and then you can use the various Alteryx Tools to recreate what your input SQL would have done.

 

If you really need to use SQL when connecting to .xlsx file, you can define an ODBC data source that points to the file and the use the Generic ODBC driver to read the data." 

 

My question is then - if you can't perform SQL queries on .csv data in the classical input tools, can you do it some other way, outside those tools? Have someone already done this, meaning that this code is potentially already available somewhere? From what I've read I gather that in theory it should be doable, but I'm quite uncertain as to how to implement solutions like those described here in Alteryx.

 

In one specific use-case I have right now I am confronted with multiple .csv datasets with ~500.000 records each in ~30+ columns, and what I actually need is 2000 of those records from a handful of the columns. (/Dynamic) Input tools are an extremely inefficient solution here - 'there has to be a better way'. Making a 'mirror table' that can be queried with SQL is not a solution that helps here, because I only need to process this data once every time it updates, and if I can't perform SQL on the .csv file I'll have to load in the entire data source anyway in order to update the table, which is precisely what I'm trying to avoid having to do in the first place. 

 

Anybody have any ideas as to how to handle this sort of problem in a good way?

1 REPLY 1
danilang
19 - Altair
19 - Altair

Hi @BYJE 

 

The Microsoft Access 2013 Runtime allows you to set up ODBC connections to CSV/text files.  Using the ODBC connection, you can use basic SQL to find the 2000 records that you need.

 

However, there's no guarantee that this will actually run any faster than reading in the entire file and filtering for the records that you need in Alteryx. CSV files don't contain any indexing information, so the entire file will still have to be scanned and processed by the Access ODBC driver. 

 

You might also the find that the csv files are not well formatted and throw errors in the ODBC driver.  In that case, you'll need to load them in and parse them line by line-by-line.     

 

I'd be interested in your results.  Run some speed tests and post the results here.

 

Dan 

Labels