Alteryx Designer Desktop Discussions

benwh56 · ‎05-31-2024

Hi All,

I have a .yxdb file that has about 15.6m rows and 30 columns. I'd like to explore adding the contents of the .yxdb to our data warehouse. However, when I test reading the table from our data warehouse via SQL Server, it takes a very long time to read all of the data in - roughly 6-7 hours. When I read it from the .yxdb, however, this time is dramatically less (it only takes around 8 minutes via this method). I'd like to get as close to this time as possible when reading from the data warehouse.

I don't any experience fine tuning SQL Server tables for performance, but I'm trying out versions of the table with a clustered index on different columns. So far, this hasn't produced any noticeable difference. My current testing method is to just use an Input Tool connected to a Select Tool.

Has anyone ever had a similar experience and was able to improve read times when doing a SELECT * from a large SQL Server table?

Thank you in advance! If I can provide additional information let me know! :)

wesley18mendonca_ · ‎06-01-2024

HI @benwh56

It sounds like the issue you are encountering may be more related to your SQL Server and its configuration rather than anything specific to Alteryx. While Alteryx is performing well when reading the .yxdb file, the performance bottleneck seems to be on the SQL Server side when reading the large table.

If you allows me...Make sure SQL Server has enough hardware and resourcers allocate; test and use a right index for you table; if your SQL Server is on a different network or geographical location, network latency can impact performance; enable and configure parallelism in SQL Server to leverage multiple CPUs... etc etc

cmcclellan · ‎06-01-2024

So it's the same table in the yxdb and SQL Server ? same record count ? I can understand why the yxdb is faster - it's local and Alteryx knows the data is "right there", when it's SQL Server you're reading over 15 million records, 30 columns (no mention of data type/s which can make a huge difference) and a SELECT * FROM <TABLE> which means you're doing a full table scan and not using an index at all (so it doesn't matter what index you define, you're not addressing it in the SELECT) and you want all that in memory on your computer .... so 6-7 hours might be OK. Alteryx is doing it, but also writing a lot of data to disk (I'm guessing) as it needs to. Add on network latency and SQL Server performance like @wesley18mendonca_ mentioned and I'm not sure if you can improve on the 6hrs.

BUT ... what do you really want to do?

If you simply want to INSERT the yxdb records then there's no need to read the SQL Server table, if you want to UPSERT or UPDATE then you still don't need the entire table, but if you do it might be easier on the workflow (and a good opportunity for you) to use the inDB tools.

In short - don't pull the entire SQL Server table into memory when you don't have to, get the yxdb data into SQL Server and then do whatever you're trying to do from there.

benwh56 · ‎06-02-2024

Thanks for the response!

What you are saying makes sense, I can see why the index is kind of irrelevant if it's just a SELECT * query at the moment. I'm actually working on this on behalf of another team and we've had some discussions about avoiding blanket SELECT * queries, so this'll be a bit of extra fuel for that topic.

Can you expand a bit more on the data types potentially making a huge difference in query time? Right now columns are a mixed bag of varchar, char, decimal, and one float. None of the field lengths are super long, the max being a varchar with a length of 50 and the float which is 53.

One last thing to note, I've done a test since my original post and I queried another table from our data warehouse which has 7.2m rows but 53 columns and it was able to complete a SELECT * query in about 8 minutes. That table is on a different schema than this one but is in the same database but has a different schema.

Appreciate your willingness to share what you know!

benwh56 · ‎06-02-2024

Hi @ wesley18mendonca_,

I appreciate your response! The strange thing is this situation is that we query other tables in our data warehouse using SQL Server pretty frequently and they don't take near as long as this one does. For example, I did a test on another table with about half as many rows but 53 columns and it only took about eight minutes. This makes me think there's got to be some way with how I set up this new table that is contributing to it taking so long to be read into Alteryx compared to other tables in the same data warehouse.

Thanks again for your response!

cmcclellan · ‎06-02-2024

The datatypes are exactly as you are explaining - numeric and date fields store compressed values, strings (even varchars) take a lot more room and therefore longer to push the data to Alteryx - especially when you're talking about millions of records.

It's hard to say why the other table takes a lot less time, but I'm hoping you can talk to the DBA and they can help diagnose the differences.

danilang · ‎06-03-2024

Hi @benwh56

Another thing to consider is the driver that you're using. In my tests, OleDB is much faster than ODBC. Here are the results of reading .8M records from SQL Server 19 hosted in our Azure private cloud

OleDb: 8 seconds

ODBC: 49 seconds

Another small increase can be obtained by using older MS SQL clients. The results shown above are from using SQL Server Native Client 11. Using SQL Client 19 decreases performance by about 5%

Dan

apathetichell · ‎06-03-2024

1) Turn off amp.

2) Get more ram.

3) Turn off VPN - or go into the office when running.

My hunch is you have 16gb of ram and need at least 32 for this. So while the local machine -> SQLServer route is an issue - I think you're seeing bottlenecks locally as well (resource manager can tell you this). 8 minutes for that size .yxdb sounds slow - and I think that's pointing out additional system limitations which you are seeing a more pronounced effect with when writing to your data warehouse.

If that doesn't work - I'd recommend writing to a file (ie csv) - and putting that file in a server. ingesting to datawarehouse via server-> server.

jdminton · ‎06-03-2024

Agree with the previous posts. One thing you might consider depending upon the needs downstream is to use the in-database tools to connect, select, filter, etc. prior to bringing into Alteryx in order to only work with the data needed locally. The in-DB tools run using SQL Server's resources without having to transfer data locally.

benwh56 · ‎06-03-2024

I appreciate everyone's feedback and suggestions! I'm going to try a lot of what was suggested here and I will report back with finding.

Thanks!

Alteryx Designer Desktop Discussions

Reading from SQL Server with Input Tool takes a long time