Dealing with large matrixed data sets
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I saw your solution to the matrix multiplication problem and it was the closest thing that I've seen to solving the task that I am trying to solve.
I have built massive models in excel using many quarters of bank data from the regulators on about 700 publicly traded banks.
My models have gotten impossible to work with in excel, and I came across Alteryx as a possible solution as a platform to move my models to.
The primary "thing" that I need to get started in my work is to learn the most effective way to deal with large matrixes of data (700 tickers with, at times, up to 800 quarters worth data) that I need to perform operations, regularly complex, where ABC's 3Q13 Data1 is operated on against ABC's 3Q13 Data2 from another matrix of equal size for matched elements in the matrixes.
Can someone direct me to a tutorial or something that focuses on that particular problem?
I have many, many of these to do and want to make sure that I'm using the most efficient approach from the start.
Best,
Allen
Solved! Go to Solution.
- Labels:
- Best Practices
- Datasets
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Can you provide more context or an example?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
For example: I have to very large tables of data that contain historical information on about 700 companies. The two tables of data are completely aligned by data and ticker but in the first table in cell C11 is ABC's 4q24 data point for Data Element 1. In the corresponding table, cell C11 contains ABC's 4q24 data for Data Element 2. I want to, for instance, multiply Data Element 1 from the First Table by Data Element 2 from the Second Table. And I want this identical operation repeated for ALL corresponding locations (which represent each company's matched data by quarter). In excel I would simply set up a formula that multiplied Cell C11 from Table 1 by Cell C11 in Table 2. I would then copy this formula across and down for the full size of the two Tables. What is the equivalent of doing this in Alteryx?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@APuwalski if you provide some sample data, and a sample desired output, then we're much better placed to be able to help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I've attached a simple example. Large Table in Tab 1, Large Table in Tab 2, and the result that I am trying to achieve in Alteryx in Tab 3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Some of the common issues that crop up in this type of operation are:
- Needing to reference new data next quarter
- Non-consistent data types in the columns
- Other operations need to be done on the data and are looked at independently.
Transpose is your friend here!!!
In Alteryx, the data is treated as a standard data table and so all columns (fields) will need to be the same type. For that I would replace NA with 0, but I've shown both ways. I split out the quarter from the headers, but you could match them other ways if you like.
Note: This can be done without transposing the data, but the operation becomes a lot more complicated to make it generic,
By transposing the data, you can join on both reference sets (ticker, Quarter). And then you can do anything you want with the data. I've shown a cross-tab to get it back to your desired result, but I would assume that you probably want to do something else with it anyhow.
The reason that I didn't use a detour end before the cross-tab is the results with NA, are string still naturally, but the other is numeric.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thankyou. This is extremely helpful!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
KGT, I'd be interested in seeing how to do this without transpose if you can direct me to an example. I'm not sure if I need it but I want to understand how it works in case I do.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Also, you were correct when you raised the issue of updating each quarter for new data. How should I approach that in this workflow?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
So, if you have an extra field coming in each quarter with new data, transpose is my favourite way of dealing with that. The example I provided will automatically handle it if there is another column on the data for the next quarter ("Dynamic or Unknown Columns" at the bottom of the transpose data columns).
Think about it this way, you need to be able to reference the fields that are there, which means that if new data comes in, you need to have in your workflow something that handles the new field that you didn't know the name of when you set the workflow up. Hence Transpose is an easy way of doing that.
Basically, any solution will use transpose as working around it will be hard due to having to generalise the metadata.
To do it without Transpose would require assigning generic column names either
- Batch macro that picks up a certain column number from each table and multiplies them. This would require some minor work to put it back together afterwards.
- From a lookup table and dynamic rename, or
- Totally dynamic on the fly. This would involve using a "Field Info" and then assigning a recordID to create a lookup that could plug into the dynamic rename. You can then have formulas for each field multiplication, however to deal with the new fields, you would need to dynamically build those formulas and use a dynamic formula tool. Not simple.
- There is most likely a way to reference column numbers using grey tools, but again, it will be complicated, and you would still need to dynamically create the formulas for additional columns.
